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Abstract 

Estimation of distribution algorithms 
(EDAs) were developed as a novel kind of evo- 
lutionary algorithms fifteen years ago. In these 
algorithms, new populations are generated via 
sampling of the estimated distribution of Soluti¬ 
ons with higher fitness values: the model of such 
a distribution is constructed in each step instead 
of generating individuals through recombination 
operators like crossover or mutation. Most of 
the current EDAs employ graphical probabilistic 
models which are, however, either computatio- 
nally very demanding or unrealistic in many 
real-world applications. Therefore, other kinds 
of models are appearing. This páper investiga- 
tes usage of multivariate elliptical copulas as a 
model of the distribution of feasible Solutions. 

1. Introduction 

Evolutionary algorithms (EAs) which utilize probabi¬ 
listic or linkage models of dependencies between va- 
riables are becoming increasingly popular. Such algo¬ 
rithms, called Estimation of Distribution Algorithms 
(EDAs) [1] or Probabilistic Model Building Genetic Al¬ 
gorithms (PMBGAs), háve many common aspects with 
the most popular EAs: genetic algorithms. Similarly to 
them, they evolve a set of promising candidate Solutions, 
a population of individuals. During each generation, a 
new set of individuals is generated and a part or the for- 
mer population is replaced according to some selection 
criterion. 

Nevertheless, the new individuals are in EDAs genera¬ 
ted differently. Instead of genetic operators like crosso¬ 
ver and mutation, EDAs estimate the probability distri¬ 
bution of the most promising Solutions, and new popu¬ 
lations are obtained by random sampling from this dis¬ 


tribution. The current páper recalls the most important 
kinds of EDAs and models for estimating the probabi¬ 
lity distributions while focusing on the recent usage of 
copulas as a model of distribution, especially multivari¬ 
ate elliptical copulas. 

The páper is divided in following sections. In the next 
section, the generál concept of EDAs is briefly presen- 
ted. The third section gives a short overview of the di- 
fferent variants of EDAs, and the Section 4 is focused 
on utilizing of elliptical copulas as a probabilistic mo¬ 
del. In the last section, two experiments evaluating the 
proposed solution are described. 

2. Basic principles of EDAs 

The majority of both the EAs and EDAs are rather si- 
milar. The generál pseudo-code of EDAs is outlined in 
Fig. 1. Here, steps (1), (2) and (3) are the same as in 
many evolutionary algorithms while steps (4) and (5) are 
typical particularly for EDAs. 

1: Pg <— randomly generate m individuals 
2: for k = 1,2,... until a stopping criterion is met do 
3: pool <— select n < m individuals from Pk-i ac¬ 

cording to the selection method 
4: Pz(x) = p(x I pool) estimate the probability 

distribution of an individual based on the selected 
individuals (in pool) 

5: Pi sample new population from pi (x) 

6 : end for 

Figuře 1: Estimation of distribution algorithm. 

The main difference between EDAs and EAs lies in the 
method how they generate new individuals according to 
the previous generation. Whereas traditional EAs, for 
example genetic algorithms, try to implicitly combine 
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building blocks representing promising parts of genetic 
code of already found good Solutions by genetic operati- 
ons (crossover, mutation) [2], EDAs try to find correlati- 
ons among variables in an explicit way. 

The probabilistic distribution of the input variables is es- 
timated. In the following text, the term model will re- 
present a formal framework for estimating the joint pro¬ 
bability distribution of individuals. Having this model, 
generating new individuals is relatively easy. However, 
estimating of the distribution with the model is often a 
bottleneck of EDAs; especially when the problém being 
solved is hard and complex dependencies among varia¬ 
bles háve to be determined. 

2.1. Probabilistic graphical models 

The majority of present EDAs estimate the probability 
distribution with probabilistic graphical models [1,3]. 
These models make use of a directed acyclic graphs 
(DAG) where each node corresponds to one input va- 
riable Xi, and the arcs define dependencies between va¬ 
riables. 

Further, the models consist of a set of unconditional pro- 
babilities for all root nodes of the graph p(Xi = Xi \ 0), 
and a set of conditional probabilities for other nodes, 
given values of their respective parents Prí: p{Xi = 
Xi I paj). Here, p denotes genemlized probability dis¬ 
tribution which stands for mass probability p{Xi = x^) 
for discrete random variables and density function f{xi) 
for continuous Xi. 

From the conditional (in)dependence defined by the 
DAG, the factorization of the joint probability distribu¬ 
tion of the variables can be expressed as 

n 

p{xi,...,Xu I Os) = '[\p{xi I paf,0O. (1) 

i=l 

The most frequent representatives of probabilistic gra¬ 
phical models are Bayesian networks for discrete vari¬ 
ables and Gaussian networks for continuous variables. 
While in čase of Bayesian networks the joint probability 
distribution can be written analogically to (1), Gaussian 
networks use the density function of normál distribution 
with nontrivial parameters 

/(xi, ...,Xn\Os) = nr=i 

~ N{mi + Ex.GPai - mj),Vi). (2) 

The parameter mi denotes unconditional mean of Xi, 
bji is a linear coefficient reflecting the strength of re- 
lationship between variables Xj and Xi, and v i is the 
variance of Xi given Pa^. 
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3. Current variants of EDAs 

Todays variants of EDAs can be distinguished accor- 
ding to complexness of interactions among variables, 
and different variants for discrete and continuous vari¬ 
ables háve been developed. 

The simplest algorithms consider all the variables inde¬ 
pendent. For discrete dimensions PBIL [4], UMDA [5] 
and cGA [6] exist, UMDAc is a continuous variant of 
the second one. 

Algorithms whose variables are able to depend on one 
predecessor are, for example, MIMIC [7], COMIT [8], 
or BMDA[9]. 

Multiple dependencies are able to be expressed by 
BOA (Bayesian Estimation Algorithm) and its vari¬ 
ants [10] - probably the most vividly developing dis¬ 
crete EDA today. Other multiple-dependencies-EDAs 
are, for example, EBNA [11] or EDA [12]. Continu¬ 
ous versions are rather few but some of them exist: 
EGNA [13] orrBOA [14]. 

4. Copulas as a probabilistic model for EDAs 

The major motivation of usage of copulas in EDAs 
lies in their simplicity and ability of expressing others 
than gaussian joint distributions. A copula is a function 
which connects two or more uniformly distributed vari¬ 
ables together 

C{ui,U2,...,Un), ~ (7(0, 1). (3) 

Instead of uniformly distributed ui,... ,Un, univariate 
inverse marginal distribution functions of arbitrary va¬ 
riables can be ušed. This usage forms a joint multivari- 
ate distribution of these variables, as it is described by 
Sklar’s theorem (see eqn. (4)). 

More formally, the copula is a function C : [0,1]" -P- 
[0,1] satisfying following conditions: 

1. C{xi ,..., x„) = 0 whenever 3i : Xi = 0, 

2. C{xi,... ,Xn) = Xj whenever Vz j : Xi = l, 
and 

3. C{xi,... ,Xn) is n-increasing (see [15] for de- 
tails). 

Especially from the condition (b) follows that all the co¬ 
pula function háve uniformly distributed marginals. 

The important result of the Sklaťs theorem [15] is that 
for any given joint distribution function H{xi,... ,x„) 
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with marginals Fi(xi),..., Fn{xn), there exists an 
n-copula C such that for all (xi,... ,x„) € 

[R U {—oo, oo}]" 

H{xi,...,Xn) = C{Fi{xi),... ,Fn{x„)). (4) 

Expressing or estimating marginal distributions 
Fi{xi),..., Fn{x„) from data is relatively easy. 
However, as the true distribution function H is usu- 
ally unknown and the Sklaťs theorem gives only exis¬ 
tence of the copula C, the correct variant of the copula 
function and its parameters háve to be estimated. 

Employing of copulas in EDAs appeared in the litera¬ 
tuře only recently [16-19]. Most of these publications 
use only bivariate copulas which are differently connec- 
ted forming a multivariate distribution function. 

Several kinds of copulas are distinguished in the lite¬ 
ratuře. The most famous are elliptical and Archime- 
dean families. While for the multivariate elliptical co¬ 
pulas (primarily Gaussian and í-copulas) conventional 
maximum-likelihood (ML) based methods for parame- 
ter estimation exist, estimation and sampling of mul¬ 
tivariate Archimedean copulas require either hierarchi- 
cal approach of nesting, or method using Laplace trans- 
forms ([20], p. 67). 

4.1. Elliptical copulas 

The well-known member of the elliptical family is the 
Gaussian copula 

C{ui,...,Un;p) = (5) 

with multivariate normál (cumulative) distribution 
function (CDF) <l)p (described by a covariance matrix p) 
and inverses of univariate normál CDFs Gaussian 

copulas attained their attention, for example, in finan- 
cial sector as a mean of modelling risks [21], although 
the true contribution in this area is disputable [22]. 

The second example of this elliptical family is the 
í-copula which has very similar structure, but instead 
of normál. Student’s í-distribution is ušed. 

4.2. Gaussian and í-copula-based EDA 

Using copulas as a probabilistic model for EDAs requi- 
res (a) a method for estimating marginal distributions, 
(b) a method for fitting proper kind of copula on the data 
(previously transformed by their corresponding inverse 
marginal distribution functions), and (c) a method for 
generating individuals from the íitted copula. The cru- 
cial advantage of using copulas is that parts (a) and (b) 
can be performed independently. 
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As was stated above, standard methods for (a) estima¬ 
ting marginal distributions and (b) fitting Gaussian and 
f-copula háve already existed. In our experiment, empi- 
rical estimation smoothed via kernels was ušed for mar- 
gins, and ML estimates served for assessing parameters 
of the copulas. 

Having the marginal distributions and the parameters 
of the multivariate Gaussian or Studenťs ř-distribution, 
sampling (c) from these multivariate distributions is 
well-studied, too. All the steps are summarized in Fig. 2. 

1: Input: matrix X 6 of selected individuals 

m - the number of individuals to generate 

2: Fi{xi ),..., Fp(xp) ■(— estimate marginal distribu¬ 
tion functions (CDFs) of p columns of X and their 
inverses F^^,..., F~^ 

3: covert xi,... ,Xpto U{0,1) using inverse CDFs: 

{ui,...,Up) ^ {F~^{xi),...,F-^{xp)) 

4: U <r- (mi • • • Up) 

5: if Gaussian copula then 

6: p estimate covariance matrix of normál CDF 

from the matrix U 

7: randomly generate m samples (si,...,Sm) ~ 

$(0,p) 

8: else if Studenťs í-copula then 

9: (S, dy) ■(— estimate S and the degree of freedom 

df of ť-distribution from the matrix U 
10: randomly generate m samples (si,..., s^) from 

ř-distribution 
11 : endif 



13: (ž/i,...,2/m) ^ {Fi{S.^i),...,Fp{S.^p)) 

14: return (yi,...,ym) 

Figuře 2: Estimation and sampling from Gaussian and 
ř-copula. 

5. Experiments 

5.1. Aerospace trajectory optimization problém 

The described copula learning and sampling algorithm 
has been implemented in Matlab environment using Sta- 
tistical toolbox, and this part was integrated with Mateda 
toolbox [23] which provides implementations of several 
EDAs. 

As a test function, we háve chosen a “SAGAS” pro¬ 
blém from GTOP Database [24] - a black-box optimi¬ 
zation problém of finding the best trajectory for a spa- 
cecraft equipped with a Chemical propulsion. The re- 
sults of Copula-based EDA (CEDA) with Gaussian and 
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í-copula and comparison with EDA based on Gaussian 
networks are in Tab. 1 (EDA with mixture of Gaussi- 
ans and EGNA). The objective values in the table repre- 
sent consumption of the spacecraft - the lower number 
the better. AU the experiments ušed a population of size 
1000 and ran for 30 generations. 


Gaussian í-copula 

mixture of 



CEDA 

CEDA 

Gauss. EDA 

EGNA 

GA 

mean 1340.7 

1341.8 

1406.1 

1401.5 

1440.1 

st. dev. 103.1 

143.4 

133.9 

116.9 

203.1 


Table 1: Experimental results: the best achieved objective va¬ 
lues after 30 generations, average results from 30 
runs of algorithms. 

The results in the table show that copula-based EDAs 
outperformed not only a genetic algorithm, but EDA 
with mixture of gaussians - standard method provided 
for this task in the Mateda toolbox - and EGNA - ano- 
ther common EDA with arbitrary Gaussian networks 
(which are learned very slowly). Eurther, Gaussian co- 
pulas give more stable results than í-copula. Average 
progress of the best objective values in the hrst 30 gene¬ 
rations are in Fig. 3. 



Figuře 3: Best objective function progress in first 30 gene¬ 
rations. 


5.2. COCO - COmparing Continuous Optimizers 

The proposed CEDA algorithm has been tested on seve- 
ral non-noisy benchmark functions from COCO - plat- 
form for comparison of real-parameter globál black-box 
optimizers [25], námely Sphere (1), Skew Rastrigin- 
Bueche separable function (4), originál Rosenbrock 
(8), Sharp ridge (13), Schaffer F7 with asymmetric 
x-transformation (17) and Schwefel x sin(x) 





Figuře 4: Proportions of functions/function settings (verti- 
cal axis) being able to reach /target in given Expec- 
ted Running Times with respect to the dimension 
(ERT /D, horizontál axis), average results for all 
testing functions; graphs corresponds (from top) to 
2, 5 and 10-dimensional inputs respectively. 
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1 Sphere 





Figuře 5: Expected running time (ERT) divided by di- 
mension versus dimension in log-log presentation. 
First row: CEDA, second row: EGNA, third row: 
GA. Shown are different target values fopt + A/, 
where A/ = io{+bO.-i,-2,-3,-5,-s} the ex¬ 
ponent is given in the legend of /i. Plus symbols 
(-b) show the medián number of /-evaluations for 
the best reached target value. Crosses (x) indicate 
the total numher of /-evaluations (/tFEs(—oo)) 
divided by the number of trials. Numbers above 
ERT-symbols indicate the number of successful 
trials. Y-axis annotations are decimal logarithms. 
The thick light line with diamond markers shows 
the single best results from BBOB 2009 for A/ = 
10 “®. 


with tridiagonal transformation (20, the numbers in¬ 
dicate the numbers of function in the COCO platform). 
hree different algorithms were ušed for optimization 
of each function: CEDA (with Gaussian copula), EGNA 
(both utilizing Mateda toolbox in Matlab) and a standard 
continuous Matlab implementation of GA with pheno- 
type encoding. All the algorithms ušed the same popu- 
lation sizes and stopping criteria. Optimization was per- 
formed 10-times with each settings, on D = {2, 5,10} 
dimensions with random uniform initialization on the 
interval [—5, 5]^. 


Following the methodology suggested in COCO, as the 
main measure for assessing the performance of the al¬ 
gorithms, expected running time (ERT) was ušed - the 
expected number of objective/unc/ion evaluations (FE) 
needed to reach a given target function value /target with 
respect to the actual dimension D. Multiple target ob- 
jective values are considered: having the optimal value 
for each function settings /opt, the target values (for mi- 
nimization) are 

/target = /opt + A/, A/ G {10^, 10^ . . . , 10-«}. 

Average expected running times with respect to the di¬ 
mensions for the Sphere function are recorded in Tah. 2. 
Graphical evaluation of average results for all six functi- 
ons gives Fig.4, ERTs for the Sphere and Schaffer 
function are depicted in Fig. 5. 


1 Sphere 


Aftarget 

ERTbest/D 

le-b03 

0.10 

le-b02 

0.13 

le-bOl 

161 

le-bOO 

457 


le-01 

751 

le-02 le-03 

oo oo 

le-04 

oo 

le-05 

oo 

le-07 

oo 

Aftarget 

ERTbesi/D 

CEDA 

1 

3.8 

1 

1 


1 17e-3/le3 




CEDA 

EGNA 

1 

3.2 

1.9 

3.3 

85e-2/Ie3 





EGNA 

GA 

1 

1 

8.9 

62e-l/le3 


1 Sphere 





GA 

Aftarget 

leH-03 

le+02 

le+01 

le-bOO 

le-01 

le-02 

le-03 

le-04 

le-05 

le-07 

Aftarget 

ERTbest/D 

0.20 

0.20 

15 

155 

353 

550 

694 

1009 

OO 

OO 

ERTbesi/D 

CEDA 

1 

1 

1 

1 

1 

1 

1 

1 45e-6/le3 


CEDA 

EGNA 

1 

1.2 

1.1 

1.6 

2.1 

18 

36e-3/le3 




EGNA 

GA 

1 

1 

28 

8.1 

31 

43e-2/le3 





GA 


1 Sphere 


Aftarget 

ERTbest/D 

le-b03 

0.50 

le-b02 

0.50 

le+01 

2.3 

le+00 

11 

le-01 

58 

le-02 

162 

le-03 

265 

le-04 

356 

le-05 

458 

le-07 

626 

Aftarget 

ERTbest/D 

CEDA 

1 

1 

1.2 

1.3 

1 

1 

1 

1 

1 

1 

CEDA 

EGNA 

1 

1 

1 

1 

1.9 

1.4 

1.2 

1.2 

1.2 

1.5 

EGNA 

GA 

1 

1 

48 

31 

8.2 

7.9 

13 

30 

54e-4/le3 


GA 


Tahle 2: Running time excess ERT/ERTbest on /i (Sphere function), in italics is given the medián hnal function value and the 
medián number of function evaluations to reach this value divided by the dimension. 
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6. Conclusion 

The most important types of estimation of distribution 
algorithms were described in this páper; a speciál em- 
phasis has been given on usage of copulas as a novel 
kind of probablistic model, forming a “copula-based” 
EDA - CEDA. 

Performance of the proposed algorithms were tested on 
a well-studied benchmark problém of aerospace trajec- 
tory as well as COCO optimization platform. Copula- 
based EDAs slightly outperformed traditional EDAs as 
well as the standard genetic algorithm. 
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Abstract 

This páper is concemed with the reliability 
of individual predictions in regression. For this 
purpose we deseribe conformal predictors and 
some methods for estimating the reliability of in¬ 
dividual predictions such as sensitivity analysis 
or local modeling of prediction error. Finally we 
carry out a simulation to compare the methods 
in an experiment. 

1. Introduction 

This páper is concerned with the reliability of predicti¬ 
ons in regression models. In the hrst section we are in- 
terested in conformal predictors, which for every confi- 
dence level 1 — s output a prediction set. The conformal 
predictors should be valid in the sense that in the long 
run the frequency of error does not exceed e at each con- 
fidence level 1 — s and the prediction set is as smáli as 
possible. The second chapter deseribes different appro- 
aches to estimate the reliability of individual predictions 
in regression such as sensitivity analysis or local mode¬ 
ling of prediction error. In the third chapter a simulation 
study comparing different reliability estimates is intro- 
duced. 

2. Conformal prediction 

We assume that we háve successive pairs 

(xi,t/i),(a;2,2/2),---, (1) 

called examples. Each example (xi,yi) consists of an 
object Xi and its label yi. The objects are elements of 
a measurable space X called the object space and the 
labels are elements of a measurable space Y called the 
label space. Moreover we assume that X is non-empty 


and that the cr-algebra on Y is different from {0, Y}. 
We denote Zi := {xi,yi) and we set 

Z := X X Y (2) 

and call Z the example space. Thus the infinite data 
sequence (1) is an element of the measurable space Z°°. 

Our standard assumption is that the examples are cho- 
sen independently from some probability distribution Q 
on Z, that means the infinite data sequence (1) is drawn 
from the power probability distribution Q°° on Z°°. 
Usually we need only slightly weaker assumption that 
the infinite data sequence (1) is drawn from a distribu¬ 
tion P on Z°° that is exchangeable, that means that for 
every n 6 N, every permutation tt of {1,..., n}, and 
every measurable set E C Z°° hold 

-P{(zi, Z2, . . .) € Z°° : (zi, . . . , Zn) € E} = 

P{{p^ ^2, •. •) € Z°° : ( 2 .^( 1 ),..., ZT^(n)) 6 E} 

We denote Z* the set of all finite sequences of elements 
of Z, Z" the set of all sequences of elements of Z of len- 
gth n. The order in which old examples appear should 
not make any difference. In order to formalize this point 
we need the concept of a bag. A bag of size n € N is 
a collection of n elements some of which may be iden- 
tical. To identify a bag we must say what elements it 
contains and how many times each of these elements is 
repeated. We write \ 2 i,..., 2 „/ for the bag consisting 
of elements 21 ,..., 2 „, some of which may be identical 
with each other. We write Z^") for the set of all bags of 
size n of elements of a measurable space Z. The set Z^") 
is itself a measurable space. It can be defined formally 
as the power space Z" with a nonstandard cr-algebra, 
consisting of measurable subsets of Z" that contain all 
permutations of their elements. We write Z^*) for the set 
of all bags of elements of Z. 
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2.1. Confidence predictors 

We assume that at the nth trial we háve íirstly only the 
object Xn and only later we get the label y„. If we simply 
want to predict then we need a function 

D :Z* xX^Y. ( 3 ) 

We call such a function a simple predictor, always as- 
suming it is measurable. For any sequence of old exam- 
ples xi,yi,..., x„-i, yn-i € Z* and any new object 
Xn, it gives D(xi,yi,... ,x„-i,y„-i,Xn) € Y as its 
prediction for the new label 

Instead of merely choosing a single element of Y as our 
prediction for we want to give subsets of Y large 
enough that we can be confident that will fall in them, 
while also giving smaller subsets in which we are less 
confident. An algorithm that predicts in this sense requi- 
res additional input e 6 ( 0 , 1 ), which we call signifi- 
cance level, the complementary value 1 — e is called 
confidence level. Given all these inputs 

xi,yi,... ,x„-i,yn-i,x„,s (4) 

an algorithm F that interests us outputs a subset 

r^(xi,yi,...,Xn-i,yn-i,x„) (5) 

of Y. We require this subset to shrink as e is increased 
that means it holds 

r (a^l , yi, • • • , tCn—1, yn—1, ^n) ^ 

^‘'^{xi,yi,...,Xn-l,yn-l,Xn) ( 6 ) 

wheneverei > £ 2 - 

Formally, we call a measurable function 

r : Z* X X X (0,1) ^ 2 "^ (7) 

that satisfies ( 6 ) for all n 6 N, all incomplete data 
sequences xi, j/i,..., and all signifi- 

cance levels £1 > £2 a confidence predictor. 

We now introduce a formal notation for the errors V ma- 
kes when it processes the data sequence 

íx = {xi,yi,X2,y2,---) (8) 

at significance level e. Whether F makes an error on the 
nth trial can be represented by a number that is one in 
čase of an error and zero in čase of no error 

'1 iíyni T'^{xi,yi,..., 

err®(F,a;) := < Xn-i,yn-i,Xn), ( 9 ^ 

0 otherwise, 


and the number of errors during the first n trials is 

n 

Err^(r,w) := ^err^(F,a;) (10) 


If w is drawn from an exchangeable probability distribu- 
tion P, the number errjj(F, cu) is the realized value of a 
random variable , which we may designate err® (F, P). 
We say that confidence predictor is exactly valid if for 
each £ 

errf(F,P),errl(F,P),... (11) 

is a sequence of independent Bernoulli random variables 
with parameter s. 

The confidence predictor F is conservatively valid if 
for any exchangeable probability distribution P on Z°° 
there exists a probability space with two families 

(d"^£€( 0 ,l),n=l, 2 ,...) ( 12 ) 

and 

(r 7 Í^) :£6 (0,l),n= 1,2,...) (13) 

of { 0 , l}-valued variables such that 

• for a fixed £, , ^ 2 ^^^,... is a sequence of inde¬ 

pendent Bernoulli random variables with parame¬ 
ter £; 

• for all n and s, r]n^ < ; 

• the joint distribution of errj^(F,P), £ 6 (0,1), 
n = 1, 2 ,..., coincides with the joint distribution 
of £ 6 ( 0 , 1 ), n = 1 , 2 ,. ... 

To obtain exact validity we define randomized confi¬ 
dence predictor as a measurable function 

F: (Xx[0, 1 ]xY)*x(Xx[0,1])x(0,1)^2Y (14) 

which, for all significance levels £1 > £ 2 , all positive 
integer n, and all incomplete data sequences 

Xl,X]^,yi, • • • , Xn—l , tn— 1 , yn—1, Xn, Xn, (15) 

where Xi eX,tíE [ 0 , 1 ] and yi eY for all i satisfies 

F Xn—l, Xn—l, Vn-l, Xn, Xn) G 

r®='(xi,Tl,í/l, . . . ,Xn-l,Xn-l,yn-l,Xn,Xn)- (16) 

We will always assume that ti , T 2 ,... are random num- 
bers independently drawn from uniform distribution on 
[0,1]. We define errj)^(F, co) by (9) with Xi now being ex- 
tended objects Xi eXx [0,1] and Err))(F, w) is defined 
by ( 10 ) as before. 
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2.2. Conformal predictors 

A nonconformity measure is a measurable mapping 

A:Z(*)xZ^S. (17) 

To each possible bag of old examples and each possi- 
ble new example, A assigns a numerical score indicating 
how different the new example is from the old ones. It is 
sometimes convenient to consider separately how a non¬ 
conformity measure deals with bags of different sizes. If 
A is a nonconformity measure, for each n = 1 , 2 ,... we 
define the function 


The smoothed conformal predictor determined by the 
nonconformity measure (A„) is a randomized confi- 
dence predictor T obtained by setting 

r^(xi,Tl,yi, . . . ,X„-l,Tn-iyn-l,X„,Tn) (23) 

equal to the set of all labels t/ 6 Y such that 

1(2=1,.■■,n:Qt>Q:n.}| , 

n 

> e, (24) 

where Ui are defined by (23). The left-hand side of (24) 
is called the smoothed p-value. 


A„ : X Z ^ R (18) 

as the restriction of A to x Z. The sequence 

(A„ : n 6 N), which we abbreviate to (A„) will also 
be called a nonconformity measure. 

Given a nonconformity measure (A„) and a bag 
\zi,..., z„/ we can compute the nonconformity score 

Oi := An{\zi,..., Zi-i, z^+l,... Zn/, Zi) (19) 

for each example Zi in the bag. Because a nonconfor¬ 
mity measure (A„) may be scaled however we like, the 
numerical value of does not, by itself, telí us how 
unusual (A„) finds Zi to be. For that we define p-value 
for Zi as 

\{j = aj > ai}\ 

n 

The conformal predictor defined by a nonconformity 
measure (A„) is the confidence predictor T obtained by 
setting 

r ^n—15 1/n—1 í ^n) (21) 

equal to the set of all labels y eY such that 

\{i=l,...,n:ai> q„}| ^ ^ ^^ 2 ) 

n 

where 

Oi := An{\{xi,yi),...,{xi-i,yi-i), 

7 • • • 7 (^n—I7 1/71—1)7 (^77 7 y ') / ^ 

{xt,yi)), Ví = 1 , 

On ■■= An{\{xi,yi),...,{x„-l,yn-l)/,{Xn,y)) 

Proofs of the next two theorems can be found in [2]. 

Theorem 2.1 All conformal predictors are conserva- 
tive. 


Theorem 2.2 Any smoothed conformal predictor is 
exactly valid. 

If we are given a simple predictor (3) whose output does 
not depend on the order in which the old examples are 
presented, than the simple predictor D deíines a pre- 
diction rule : X —> Y by the formula 

^\zi^...,Zn/{x} • .^(^17 • ■ • 7 ^77 7 xf (25) 

A natural measure of nonconformity of Zi is the devi- 
ation of the predicted label 

Vi ■= D\zi,...,z„/{xi) (26) 

from the true label yi. We can also use the deleted pre- 
diction defined as 

y{z) ■■= D\zu-,Zi.uZi+u-,z„/{Xi)- (27) 

More generally, the prediction rule may map 

X to some prediction space Y not necessarily coinci- 
ding with Y. An invariant simple predictor is a function 
D that maps each bag \zi,..., Zn/ of each size n to a 
prediction rule : X —> Y and such that the 

function 

(\zi 7 ..., Zjif x') I y 77 ^ 2:1 {x^ (28) 

of the type x X —> Y is measurable for all n. 

A discrepancy measure is a measurable function A : 
Y X Y —> R. Given an invariant simple predictor D 
and discrepancy measure A we define functions A„ as 
follows; for any {{xi,yi),{xn,yn)) € Z*, the va- 
lues 

at = An{\{xi,yi),..., {xi-l,y^-l), 

(Xi+i, í/i+i), ..., ixn,yn)/, {Xi, yi)) (29) 

are defined by the formula 

a, := A(í/í,77\2i.....z„/(a:7)) (30) 

or the formula 

a^ := A(yj,77\2i.....z7_i.z7+i.....z„/(a;i))- (31) 

It can be easily checked that in both cases A„ form a 
nonconformity measure. 
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3. Reliability estimates 

In this chapter we are interested in different approaches 
to estimate the reliability of individual predictions in re¬ 
gression. 


3.1. Sensitivity analysis 

To estimate the reliability for a given example x, we 
compute the initial prediction K of the example x. Then 
we label x with K + e{lmax — Imin), where e is a sensi¬ 
tivity parameter, and Imin and Imax denote the lower 
and the upper label bounds of the learning examples, 
respectively. We add the new labeled x in the learning 
set. In the next step, a new sensitivity model is induced 
on the modified learning set and this model is ušed to 
compute a sensitivity prediction Ke for the same par- 
ticular example x. After computing different sensitivity 
predictions using different values of parameter e € E, 
where E is some set of positive reál parameters, the pre¬ 
dictions are combined into different reliability estimates. 
Sensitivity analysis - variance is defined as 

SEvar(x) := - K_e) 

\E\ 


and sensitivity analysis - bias is defined as 


SEbias(x) 


- K) + 
2\E\ 



3.2. Variance of a bagged model 


We are given a learning set L = {(xi, j/i),..., 

(xn, Vn)}- We také repeated bootstrap samples = 

1 ,... , m from the learning set and induce a model on 
each of these samples. Each of the models yields a pre¬ 
diction Ki,i = 1,..., m for an example x. The label of 
the example x is predicted by averaging the individual 
predictions 


K := 




(34) 


We call this proceduře bootstrap aggregating or bagging. 
The reliability estimate of a bagged model is defined as 
the prediction variance 


- m 

BAGVÍx) := — - KÝ . (35) 

m ^ 

%=\ 


advance. For each (xí,Cí) 6 N we generate a model 
Mi on N \ {{xi,Ci)}. Then we compute local leave- 
one-out (LOO) prediction Ki for example Xi using mo¬ 
del Mi and we compute LOO error Ei = \Ci — Ki\. 
The LCV reliability estimate is computed as the weigh- 
ted average of the nearest neighbors’ local errors 

LCV(x) := 

Z-^{xi,Ci)€:N d{xi,x) 

where d is some distance on the object space. 

3.4. Local modeling of prediction error 

Given a set of k nearest neighbors N = {(xi, Gi),..., 
(xfc, Gfc)}, we define the estimate CNK (GNeighbors “ K) 
for an unlabeled example x as the difference between 
the average label of the nearest neighbors and the exam- 
ple’s prediction K (using the model that was generated 
on all learning examples) 

CNK(x) := ' - K. (37) 

k 


3.5. Density-based reliability estimate 

The density-based estimation of prediction error assu- 
mes that error is lower for predictions which are made in 
denser problém subspaces (a portion of the input space 
with a more learning examples), and higher for predicti¬ 
ons which are made in sparser problém subspaces. But 
it has the disadvantage that it does not také into account 
the learning examples’ labels. This causes the method 
to perform poorly with noisy data and in cases when 
distinct examples are not clearly separable. Given the 
learning set L = {(xi,t/i),..., (x„,t/„)}, the density 
estimate for unlabeled example x is defined as 


p{x) 


Er=i K{d{x,Xi)) 
n 


(38) 


where d denotes some distance on the object space and 
řt is a kernel function (for example the Gaussian). Since 
we expect the prediction error to be higher in cases when 
the density is lower, it means that p{x) correlates nega- 
tively with the prediction error. To establish the positive 
correlation we define the reliability estimate as 


DENS(x) := max {p{xi)) — p{x). (39) 


3.3. Local cross-validation reliability estimate 

Suppose we are given an unlabeled example x 
for which we wish to compute the prediction and 
the local cross-validation (LCV) reliability estimate. 
We define the set of k nearest neighbors of x: 
N = {(xi, Gi),..., (xfc, Gfc)}, where fc is selected in 


4. Simulation 

We carried out a simulation to compare different me- 
thods for estimating the local error of regression models. 
We ušed neural networks with radial basis functions 
(RBF networks) as our regression models with Gaussian 
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ušed as the basis function. Therefore, the output of the 
RBF network / : R" R has the form 

N 

/W = ^7ríexp{-/3i||x- Ci||2} (40) 

where N is the number of neurons in the hidden layer, 
Ci is the center vector for neuron i. Pí determines the wi- 
dth of the ith neuron and ttí are the weights of the linear 
output neuron. RBF networks are universal approxima- 
tors on a compact subset of R". This means that a RBF 
network with enough hidden neurons can approximate 
any continuous function with arbitrary precision. 

We ušed a benchmark function similar to some empiri- 
cal functions ušed in chemistry to carry out our experi¬ 
ment. This function was introduced in [4]. The value of 
this function ů in the point (a;i,a;2,a:3,a:4,a;5) can be 
expressed as 

ií{xi,X2,X3,X4,X5) = 

- B{x2,X3)C{x3,X4,X5) (41) 

where 

A{xi , X2) = 0.6g{xi — 0.35, X 2 — 0.35) 

+ 0.75g(xi — 0.1, a;2 — 0.1) 

+ g{xi — 0.35, a;2 — 0.1) 

B{x 2,X3) = 0.4g{x2 - 0.1,*3 - 0.3) 

C(X3,X4, X5) = 5 + 25[1 -{1 + {X3 - 0.3)2 

-b (a;4 - 0.15)2-h (xs - 0.1)2}1/2] 

g{a, b) = 100 - V(100a)2 -f (1005)2 

^ sin v/(100o)2 -b (1005)2 

v/(100a)2 -b (100Ď)2 -b (0.01)2 • 

Moreover, the input vectors must satisfy following con- 
ditions 

5 

= 1 and Xi € [0,1], for i = 1,..., 5. (42) 

We compared different reliability estimates of individual 
predictions: the width of confidence intervals (CONF), 
the variance of a bagged model (BAGV) and the local 
modeling of prediction errors using nearest neighbors 
(CNK estimate). 

We repeated the following proceduře five times in our 
simulation. 

• Randomly generate 600 points satisfying the con- 
ditions (42). 


• Compute the function values of function -ů in 
these points. 

• Split this set of points into a training set of 500 
points and a testing set of 100 points. 

• Split the training set into the proper training set 
and the validation set in proportion 2:1. 

• Fit RBF networks with 1,2, 3,4 and 5 hidden 
neurons twelve times using the Matlab function 
Isqcurvefit on the proper training set. 

• Choose the RBF network with the smallest error 
on the validation set for each number of hidden 
neurons. 

• Compute the predictions based on the RBF ne¬ 
tworks for the testing points for each number of 
hidden neurons. 

• Compute the reliability estimates in each of 100 
testing points for each number of hidden neurons. 

• Compute the reál error in each of 100 testing 
points for each number of hidden neurons. 

The initial values of parameters ttí were set as mean of 
the response vector, initial values of Pí were set as the 
mean of the standard deviation of the components of tra¬ 
ining data points. The centers c; were set randomly. 

The confidence intervals were computed using Matlab 
function nlpredci with the option simopt on (CONFl) 
and off (CONFO). The Jacobian can be computed 
exactly, because the form of the RBF network is known 
and differentiable. Therefore, we supply the function nl¬ 
predci with this Jacobian. 

The variance of a bagged model was computed for num¬ 
ber of different models m = 10 and the bootstrap sam- 
ples were as big as the originál sample. 

The CNK estimates were computed for number of nei¬ 
ghbors fc = 2,5,10. 

Finally, we computed in each of the testing point for 
each method a KendalTs rank correlation coefficient be- 
tween the reál error and the predicted error for the RBF 
networks with 1,..., 5 hidden neurons. Then we took 
the average correlation for each method for all 100 tes¬ 
ting points and average overall. Results can be found in 
Table 1. Moreover, we computed the number of cases in 
which the method chose the best model. It means that 
our method estimated the smallest error for the model 
with the smallest error. These results can be found in 
Table 2. Table 3 shows the number of cases in which the 
best model had given number of hidden neurons. 
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CONFO 

CONFl 

CNK2 

CNK5 

CNKIO 

BAGV 

0.360 

0.294 

0.568 

0.572 

0.602 

0.392 

0.188 

-0.030 

0.570 

0.536 

0.540 

0.330 

0.264 

0.078 

0.552 

0.644 

0.650 

0.290 

0.530 

0.462 

0.474 

0.484 

0.454 

0.310 

0.218 

0.034 

0.508 

0.578 

0.596 

0.372 

0.312 

0.168 

0.534 

0.563 

0.568 

0.339 


Table 1: Correlation of the reliability estimates and the reál 
error in each of the five runs and the average corre¬ 
lation. 


CONFO 

CONFl 

CNK2 

CNK5 

CNKIO 

BAGV 

39 

33 

56 

56 

61 

34 

28 

28 

53 

50 

43 

21 

47 

47 

54 

56 

56 

22 

24 

19 

37 

42 

35 

20 

26 

25 

51 

55 

57 

29 

164 

152 

251 

259 

252 

126 


Table 2: Number of cases in which the best model was 
correctly chosen (out of 100 and the summary). 


#Neurons 

1 

2 

3 

4 

5 

#Best 

16 

75 

207 

88 

114 


Table 3: The number of cases in which the best model has 
1, 2, 3, 4 or 5 hidden neurons. 

We can see from the previous tables that the best results 
are achieved by the CNK reliability estimate. It chose 
the best model approximately in half of all cases. The 
number of ušed neighbors is not too important in our 
study as the CNK estimate worked very similarly for 
2, 5 and 10 neighbors. We can also see from Table 2 and 
3 that the CNK estimate chose the best model more of- 
ten than if we také the globally most suitable model with 
three hidden neurons. The BAGV model did not perform 
that well. It chose the best model only one in four cases 
which is only slightly better than random selection. The 
correlation is also much lower than for CNK. Finally, 
the confidence interval with option simopt on worked 
very poorly. If we use the option simopt off it works 


a little bit better. But the correlation is still not too high 
and also the best model is chosen only one in three cases 
which is slightly better than BAGV but much worse than 
the CNK estimate. The reason for this behavior of confi¬ 
dence intervals is probably too nonlinear problém which 
can not be even locally linearly approximated with suf- 
ficient precision. 

5. Conclusion 

We described some approaches to estimating the relia¬ 
bility of individual predictions in regression. We com- 
pared confidence intervals, local modeling of prediction 
error, and variance of a bagged model in a simulation 
study. The local modeling of prediction error gave very 
good results and it could be ušed for choosing the best 
model. The confidence intervals did not perform too 
well, probably because the problém was too complex. In 
our future work we will try to study conformal predic- 
tors more deeply and implement some of these methods 
in the FAKE GAME (Fully Automated Knowledge Ex- 
traction using Group of Adaptive Models Evolution) 
project. 
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Abstract 

In our Work, we introduce a new approach 
to computing abstractions for hybrid Systems. In 
applications, hybrid systems are ušed as a forma- 
lism for modeling embedded systems and vari- 
ous other systems where software interacts with 
physicat enviroment. 

Hybrid systems are dynamic systems with 
both continuous and discrete statě. The statě of a 
hybrid systém is defined by a discrete mode and 
values for alt continuous variables. The statě in 
each mode changes continuously according to 
(ordinary) differential equations or differential 
inctusions and it is attowed to discretely ehange 
i.Ě.,jump in čase a so-called jump guard condi- 
tion is met. We can use the discrete part of hyb¬ 
rid systém to model Computer program behavior 
and continuous part to model physicat enviro¬ 
ment where both components internet with each 
other. 

Informally speaking, safety of a hybrid sys¬ 
tém is a property that, given a set of States where 
a systém can start its evolution, i.e., initial States 
and a set of States that should not be reached, i.e., 
unsafe States, it is not possible for the systém to 
start in an initial statě and evolve into an unsafe 
State. 

Given the hybrid systém and a property of 
hybrid systems, abstraction is a discrete systém, 
that if the abstract systém has the given property, 
then the originál (the conerete) systém has the 


property as well. If we cannot prove that the 
current abstraction has the property, we refine 
the abstraction, that is, we include more infor- 
mation about the conerete systém. 

We present our reeently published result [ 1 ] 
where the abstractions capture the reachability 
information relevant for a safety property of a 
hybrid systém as succinctly as possible. This is 
achieved by an incremental reflnement of the 
abstractions, simultaneously trying to avoid in- 
creases in their size as much as possible. The 
approach is independent of a conerete teehnique 
for computing reachability information, and can 
hence be combined with whatever teehnique su- 
itable for the problém class at hand. 

We show the usefulness of the method in the 
algorithm for safety verification of hybrid Sys¬ 
tems based on constraint propagation and abs¬ 
traction reflnement [2]. 
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Abstract 

Survival analysis is a collection of statis- 
tical methods for inference on time-to-event 
data. If several causes of failure occur and 
the occurrence of one event precludes the 
occurrence of the other events, the situation is 
known as competing risks. Since the competing 
risks violate the fundamental assumption of in¬ 
dependent censoring, speciflc methods for infe¬ 
rence are needed. The competing risks model 
and statistical methods for nonparametric ana¬ 
lysis are recalled in this páper. The methods are 
then illustrated on a reál data set of 118 Chro- 
nic Myeloid Leukemia (CML) patients from the 
Clinic of Haemato-oncology of the University 
Hospital in Olomouc. The overall survival pro¬ 
bability and risk factors of two types of fai¬ 
lure (death due to CML and death from other 
causes) are assessed. Predicted probabilities of 
the two types of failure with stratification based 
on the risk factors (Sokal score, haematologi- 
cal response to treatment) are shown. The effect 
of the Sokal score classiflcation is found am- 
biguous. While the score should identify high- 
and low-risk CML patients, it seems to be pre- 
dictive only for the failure due to other causes 
than CML. 

1. Introduction 

Methods of survival analysis háve become widely ušed 
in medical research in the past few decades. Standard 
survival data (also called time-to-event data) arise in stu- 
dies where time from some origin to an end-point is me- 
asured. The end-point is defined by occurrence of a cer- 
tain event of interest. The time until the speciíied event 
occurs can be characterized by several functions. The 
most widely ušed are the survival function, representing 
the probability of an individual surviving up to time t 


(i.e. the probability that the event has not occurred be- 
fore t), and the hazard function, representing the rate of 
occurrence of the event at a given time. Under the as¬ 
sumption of independent censoring, these functions are 
estimated by the Kaplan-Meier estimator of the survival 
function and the Nelson-Aalen estimator of the hazard 
function (for more information, see e.g. [1] or [2]). 

In some cases, several causes of failure are possible but 
the occurrence of one event precludes the occurrence of 
the other events (e.g. when failures are different causes 
of death, only the hrst one can be observed). This si¬ 
tuation is known as competing risks. Often, only one 
event is chosen for analysis, the competing causes of fai¬ 
lure are ignored and treated as right-censored observati- 
ons, and classical survival methods are ušed for infe¬ 
rence [3]. However, this approach leads to a bias in the 
Kaplan-Meier estimate [4]. The bias is caused by vio- 
lating one of the fundamental assumptions underlying 
the Kaplan-Meier estimator - the assumption of inde- 
pendence of distribution of the time to the event and the 
censoring distribution. Furthermore, independence be- 
tween distinct causes of failure cannot be verified on the 
basis of the observed competing risks data [5]. Specific 
methods are thus needed for the estimation of survival 
probabilities. The Cox proportional hazards model may 
be ušed for regression analysis, but the interpretation of 
the results becomes different [4]. 

This páper presents the competing risks model and sta¬ 
tistical methods for nonparametric analysis. The me¬ 
thods are then illustrated on reál Chronic Myeloid 
Leukemia (CML) data from the Clinic of Flaemato- 
oncology of the University Hospital in Olomouc, Czech 
Republic. AU statistical methods are implemented with 
the R software, using the survival, cmprsk and mstate 
packages [6]. 


PhD Conference ’ 11 


20 


ICS Prague 



Jana Fiirstová 


Competing Risks of CML-Related Death ... 


2. Statistical background 

Competing risks are ušed to model a situation where 
subjects under investigation are exposed to several 
causes of failure. If failures represent different causes 
of death, only the hrst event to occur is observed. In 
other settings, second and subsequent failures may be 
observable, but not of interest. The violation of the as- 
sumption of independent censoring, leading to a biased 
Kaplan-Meier estimator, is an important issue in com¬ 
peting risks models. If the competing event time distri- 
butions were independent of the distribution of time to 
the event of interest, this would imply that at each time 
the risk of this event is the same for subjects that háve 
not yet failed and are still under follow-up as for sub¬ 
jects that háve experienced a competing event by that 
time [4]. However, a subject that is censored due to fai¬ 
lure from a competing risk will certainly not experience 
the event of interest. Since subjects that will never fail 
(by the failure of interest) are treated as if they could fail 
(they are censored), the standard Kaplan-Meier estima¬ 
tor overestimates the probability of failure and underes- 
timates the corresponding survival probability [4], [7]. 

The competing risks data are represented by the failure 
time T, the failure cause D and a vector of covariates Z 
(T is assumed to be a continuous and positive random 
variable, D takés values in the finite set {!,... ,m}). 
Former approach to competing risks ušed multivariate 
failure time models. In such models each subject was 
assumed to háve a potential failure time for each type of 
event. The earliest event was actually observed and the 
others were latent. This approach focused on the joint 
distribution of the times Ti,, Tm of the m different 
failure types, described by the joint survival function 

^(íl, ...,tm)= P{Ti > tl,.--,Tm> tm)- 

The marginal hazard function 

h,(ř)= lim < Tj < t + Af|T, > t) 

is defined by the marginal survival 

5,(í) = p(r, >ř) = ,s(o,...,o,í,o,...,o). 

However, without additional assumptions, neither the 
joint survival function is identifiable from the observed 
data, nor are the marginal distributions [2], [8], [5]. This 
“latent failure time” approach has thus little practical 
use. 

A recent concept in competing risks models is the cause- 
specific hazard function and the cumulative incidence 
function. These two functions completely specify the 


joint distribution of (T, D), the failure time and the fai¬ 
lure cause [9]. The cause-specific hazard function for the 
j—th cause is defined by 




, P{t<T <t + M,D = j\T >f) 

hm -T- 

At^0+ Af 


for j = 1 ,..., m. It represents the hazard of failing from 
cause j in the presence of the competing events. The cu¬ 
mulative cause-specific hazard is then defined by 



\j{u)du. 


A function Sj(ť) = exp{—Aj(t)) should not be inter- 
preted as a marginal survival function unless the com¬ 
peting event time distributions and the censoring distri¬ 
bution are independent (in čase of independent censo¬ 
ring, the marginal distribution models the situation when 
competing events do not occur) [9]. The total hazard 
A(í) and the overall survival function S{t) are defined 
in terms of the cause-specific hazards: 


A(í) 


lim 

At^0+ 


P{t<T <t +At\T >t) 
Aí 




j=i 



This overall survival function does háve an interpre- 
tation: It is the probability of not having failed from any 
cause at time í [3]. 

The cumulative incidence function of cause j, Ij (í), is 
defined by 

Ijit) = P{T <t,D = j), j = l,...,m, 

and represents the probability of a subject failing due to 
cause j in the presence of all the competing risks. It can 
be expressed in terms of the cause-specific hazard and 
the overall survival function as 

= / )^j{u)S{u)du, j = l,.--,m. (1) 

Jo 

This function is sometimes called “crude cumulative in¬ 
cidence function” or “subdistribution function”. It is not 
a proper distribution function because the cumulative 
probability to fail from cause j remains less than unity, 
as Ij{oo) = P{D = j) [1]. The standard Kaplan-Meier 
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estimator of the probability of failing due to cause j be- models the cause-specific hazard of cause j for a subject 
fóre or at time t satisfies with a covariate vector Z by 


l — Sj{t)= í Xj{u)Sj{u)du, (2) 

Jo 

which is similar to the expression of cumulative in¬ 
tensity function Ij(ť). Equations (1) and (2) differ by 
replacing S{t) by Sj{t). Since 

S{t) < Sj{t), 

then 

with equality at í if there is no competition, i.e. if 

m 

^ A,(<) = 0. 

k=l,k^j 

This shows the bias of the Kaplan-Meier estimator if it 
is ušed to estimate Ij(t) [4]. 

The cumulative incidence function can be estimated 
using the Kaplan-Meier methodology restricted to speci- 
fic failures for each cause: Let 0 < íi < Í2 < • • • < ťn 
be the ordered distinct times at which failures of any 
cause occur. Let djk denote the number of patients fai¬ 
ling from cause j at tk, and let dk = djk denote 

the total number of failures (from any cause) at tk- Let 
nk be the number of patients at risk (i.e. patients still in 
follow-up who háve not failed from any cause) at time 
tk ■ Then the cumulative incidence function of cause j at 
time t is estimated by 

<í 

where the discretized version of the cause-specific ha¬ 
zard Aj(řfc) = P{T = tk,D = j\T > tk-i) is estima¬ 
ted by 

^■(4) = ^ 

nk 

and 

( m 

j=i 

More detailed derivation of this estimator of Ij (í) can 
be found in [1] and [4]. 

Consider now a regression model for the cause-specific 
hazard functions. Since the cause-specific hazard functi- 
ons are identifiable, regression on these functions is 
possible and a competing risks analogue of the Cox pro- 
portional hazards model becomes a logical choice [2]. It 



Aj (í , Z) = Xoj (í) exp(/?J Z), 


where Xoj(t) is the baseline cause-specific hazard of 
cause j and fdj is a vector of the regression coeffici- 
ents related to cause j. Both the baseline hazards and 
the regression coefficients are permitted to vary arbitra- 
rilyoverthe jfailuretypes. Again, letťji < tj 2 < ■ ■ ■ < 
tjkj denote the kj times of type j failures, j = l,... ,m, 
and let Z ji be the covariates for the individual that fails 
at tji. Partial likelihood is constructed with conditioning 
at each failure time: (1) on the previous history of failu¬ 
res and censoring, (2) that at time tji , a single type j 
failure occurs [4]. The partial likelihood function then 
reads [2]: 


LiUi 


exp {l3jZj,{tji)) 

exp (/3JZ^(Í,,)) ^ 


nn 


where R{tji) is the risk set at time tji. Lstimation and 
comparison of the regression coefficients /3j can be con¬ 
structed by applying asymptotic likelihood techniques 
individually to the m factors. 


3. The CML data analysis 

For illustration of the competing-risks techniques, data 
from the Clinic of Flaemato-oncology of the Univer¬ 
sity Hospital in Olomouc are ušed. The data contain 
118 patients suffering from Chronic Myeloid Leukemia 
(CML). CML is a cancer of the white blood cells. It is 
a form of leukemia characterized by the increased and 
unregulated growth of predominantly myeloid cells in 
the bone marrow and the accumulation of these cells 
in the blood. The medián age [in 1999] is 53 years, 
but all age groups, including children, are affected [10]. 
The natural history of CML is progression from a be- 
nign chronic phase to a blast crisis within three to five 
years [11]. Blast crisis is the finál phase in the evolution 
of CML, and behaves like an acute leukemia, with rapid 
progression and short survival. The blast crisis is often 
preceded by an accelerated phase, which signals that the 
disease is progressing and transformation to blast crisis 
is imminent. Drug treatment can usually stop this pro¬ 
gression if started early [10], [11], [12]. In the Czech 
Republic, there are about 200 newly diagnosed CML pa¬ 
tients per year [13]. 

All 118 patients in the data set were treated in the Olo¬ 
mouc University Hospital in the years 1989-2010. The 
last admissihle dáte of diagnosis for the analysis was in 
2006 in order to háve sufficient follow-up time for all the 


PhD Conference ’ 11 


22 


ICS Prague 



Jana Furstová 


Competing Risks of CML-Related Death ... 


patients. There is one limitation of the data concerning 
its consistency: the treatment protocol was changed in 
2001 because a new drug - Glivec - had been approved 
for treatment of the chronic phase of CML. Until 2001, 
patients were treated by Interferon. For first-line treat¬ 
ment, Interferon was ušed for all patients in the Olomouc 
data set (even those diagnosed after 2001) and most of 
the patients surviving after 2001 were then treated by 
Glivec. Out of the 118 patients, 67 are males (57%). The 
age of the patients at the dáte of diagnosis ranges from 
18 to 71, with the mean of 48 years and medián of 50 
years. At the dáte of diagnosis, the Sokal score [14] is 
evaluated for patients with CML. It identifies low- and 
high-risk patients according to their age, spleen size and 
blood cell count. The high risk group (Sokal score 3) 
contains 21% of the Olomouc patients (n = 25), the 
low risk group (Sokal score 1) covers 39% (n = 46). 
All other patients were classified with the Sokal score 2. 
Complete blood count was recorded at the dáte of dia¬ 


gnosis and haematological response to the treatment was 
assessed. Overall, 73 patients (62%) achieved complete 
haematological response (CFIR) to the Interferon treat¬ 
ment. Although other types of failure could be conside- 
red as well, the focus of this páper is the overall survival 
with initial point being the dáte of diagnosis of CML 
and terminál point being death. The events of interest 
(competing risks) are two types of failure: death due to 
CML (includes accelerating disease, progressive disease 
and blast crisis), and death from other causes (different 
types of cancer, graft-versus-host disease after stem cell 
transplantation, suicide, other). By January 2010, 39 pa¬ 
tients (33%) háve died, 23 patients died due to CML 
(20%) and 16 due to other causes (14%). Seventy nine 
patients (67%) did not experience any of these events 
and were censored in January 2010. All the competing 
risks estimations are made in terms of the overall sur¬ 
vival, i.e. time from the diagnosis of CML to death is 
considered. 



Figuře 1: Estimates of probabilities of CML-related death and death from other causes, based on Kaplan-Meier (grey) and on 
cumulative incidence functions (black). 


Figuře 1 shows the estimates of the probabilities of 
“CML-related death” and “death from other causes” for 
all patients. The CML curves are represented as survival 
curves, while the other event curves are represented as 
probability distribution functions (one minus survival) 
for greater clarity. Estimates based on the Kaplan-Meier 
method are grey, whereas the estimates of the cumula¬ 
tive incidence functions are black. For this data, the two 
estimates are relatively close to each other, however, the 
difference between the curves is obvious. The estimates 


of probability of failure based on Kaplan-Meier after 10 
years (120 months) of follow-up are P = 0.24 for CML- 
related event resp. P = 0.19 for other type of event, 
while cumulative incidence estimates are P = 0.22 and 
P = 0.16 for CML and other type of event, respecti- 
vely. This illustrates the formerly mentioned claim that 
the Kaplan-Meier estimator overestimates the probabi¬ 
lity of failure and underestimates the corresponding sur¬ 
vival probability. 
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Figuře 2: Cumulative incidence curves of CML-related death and death front other causes. Differences between the curves repre- 
sent probabilities of the particular events. 


Figuře 2 shows the estimated cumulative incidence cur¬ 
ves again, displayed in a different way - they are stac- 
ked. The bottom curve represents the estimate of the 
cumulative incidence function of CML (/cml(í)), the 
top curve represents the sum of estimates of the cumu¬ 


lative incidence functions of CML and other types of 
death (IcmlÍí) + Iother{t))- This representation allows 
an easy comparison of the respective probabilities at any 
time t. 



Mean 

Medián 

Min 

Max 

Age (years) 

48 

50 

18 

71 

Leu (x 10®/Z) 

131 

86 

2 

777 

Hgb ig/l) 

125 

126 

70 

161 


Table 1: Basic characteristics of the continuous covariate variables: age, leukocyte count and haemoglobin level at the dáte of 
diagnosis. 


For the regression analysis on cause-specific hazards, 
several covariates are ušed. Basic characteristics of the 
covariates are shown in Tables 1 and 2. Sex, Sokal 
score and complete haematological response to treat- 
ment (CFIR) are categorical variables, whereas age at 
diagnosis, leukocyte count (Leu) and haemoglobin le¬ 


vel (FIgb) at diagnosis are continuous. For purposes of 
the analyses, in order to make interpretation of results 
easier, these continuous variables were converted into 
dichotomous. The cut-off levels were set (by the medi- 
cal staff) to 45 years of age, 50 x 10®/Z of leukocytes 
and llOg/l of haemoglobin. 




N 

% 

Sex 

male 

67 

57 

female 

51 

43 


1 

46 

39 

Sokal score 

2 

46 

39 


3 

25 

21 

CHR 

yes 

73 

62 


no 

44 

37 


Table 2: Basic characteristics of the categorical covariate variables. One valné is missing in the Sokal score and the complete 
haematological response to treatment (CHR) variable. 
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Table 3 reports the results of the univariate Cox regres- 
sion analysis with single covariates sex, age, Leu, Hgb, 
Sokal score and CHR. It is evident that the blood count 
has strong effect on the rate of occurrence of CML- 
related death. The leukocyte level above 50 negatively 
effects overall survival of the CML patients (hazard ratio 
(HR) = 2.52, p = 0.09), while the effect of haemoglo- 


bin level above 110 is protective (HR = 0.42, p = 0.04). 
Patients who achieve complete haematological response 
to treatment, are in a lower risk of death due to CML 
(HR = 0.33, p = 0.01). There is no evidence of any de¬ 
pendence of CML-related death rates on sex, age or the 
Sokal score. On the other hand, the strongest effect on 
the rate of occurrence of other 



CML 

exp(^CML) P 

—value 

other 

exp0other) 

p—value 

Sex (male) 

1.30 

0.55 

0.52 

0.20 

Age (> 45) 

1.40 

0.46 

1.43 

0.51 

Leu (> 50) 

2.52 

0.09 

2.31 

0.19 

Hgb(> 110) 

0.42 

0.04 

0.40 

0.08 

Sokal score 

1.43 

0.19 

2.74 

0.004 

CHR (yes) 

0.33 

0.01 

0.81 

0.70 


Table 3: Relative risk estimation for the CML-related death and death from other causes with single covariates. 


causes of death is achieved by the Sokal score. The ha¬ 
zard ratio for each extra point in the Sokal score is 2.74 
(p = 0.004). Thus, an individual having Sokal score 3 
has 7.54—times higher risk of death due to other causes 
compared to the individual having Sokal score 1 (the es- 
timated coefficient/3oť/ier = 1.01). The effect ofhaemo- 
globin level above 110 is the same for other causes of 
death as for the CML-related death: haemoglobin level 
above 110 lowers the risk (HR = 0.40, p = 0.08). There 
seems to be no effect of sex, age, leukocyte count and 
the achievement of complete haematological response 
to treatment on the risk of death from other causes than 
CML. However, the results for the sex covariate are inte- 


resting. Although the effects are not statistically signifi- 
cant (p = 0.55 and p = 0.20 for CML and other type of 
death, respectively), they are opposite for the two types 
of failure. In čase of CML-related death, males may he 
in higher risk than females (HR = 1.30), while in čase 
of other types of death, the hazard ratio for males rela¬ 
tive to females is 0.52. Sex is the only covariate with 
such opposite effects on the two types of failure. In the 
multivariate Cox regression model, no combinations of 
the above mentioned six covariates prove to háve statis¬ 
tically significant effects on the risk of failure due to any 
of the competing risks. 


CML-related death Death from other causes 




Figuře 3: Predicted cumulative incidence functions for CML-related death (left) and death from other causes (right), for patients 
with and without complete haematological response to treatment, hased on the proportional hazards model for the 
cause-specific hazards. 
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CML-related death Death from ether causes 




Time (months) Time (months) 

Figuře 4: Predicted cumulative incidence functions for CML-related death (left) and death from other causes (right), for the Sokal 
score classiíication, based on the proportional hazards model for the cause-specific hazards. 


Based on the results of the Cox regression, predicted 
cumulative incidence curves can be obtained. Figures 3 
and 4 show the predicted occurrence of CML-related de¬ 
ath and death from other causes for the groups of pati- 
ents with and without complete haematological response 
to treatment and for the Sokal score classiíication. For 
the CML-related death, the CHR achievement has a 
strong protective effect: The predicted probabilities of 
failure due to CML after ten years (120 months) are 
P = 0.15 and P = 0.38 for the “CHR yes” and the 
“CHR no” groups, respectively. On the other band, there 
seems to be no relationship between the CHR outcome 
and failure due to other causes than CML, which is to 
be expected. For both CHR groups, the predicted pro¬ 
bability of death from other causes after ten years from 
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5. Conclusion 

The competing risks model and statistical methods for 
nonparametric analysis are recalled in this páper. The 
bias in the standard Kaplan-Meier estimator and the 
need for specific methods for inference on competing 
risks data is explained. The data set of Chronic Myeloid 
Leukemia (CML) patients from the Clinic of Haemato- 
oncology of the University Hospital in Olomouc is ana- 
lyzed. The overall survival probability and risk factors of 
two types of failure (death due to CML and death from 
other causes) are assessed. The interesting role of sex 
and the Sokal score classiíication on the overall survival 
of the CML patients is discussed. Predicted probabili¬ 
ties of the two types of failure with stratiíication based 
on the chosen risk factors are shown. The effect of the 


the diagnosis is relatively low (P = 0.15). The effect 
of the Sokal score classiíication is ambiguous. While 
the score should identify high- and low-risk CML pa¬ 
tients, it seems to be predictive only for the failure due 
to other causes than CML. The predicted probabilities 
of death from other causes after ten years are P = 0.35 
and P = 0.07 for the Sokal score 3 group and the So¬ 
kal score 1 group, respectively. The predicted probabili¬ 
ties of death from CML after ten years are much closer 
one to another for all the groups - P = 0.28 for Sokal 
score 3 and P = 0.18 for Sokal score 1. Other predicted 
cumulative incidence curves are not presented here, as 
they can easily be obtained from the results of the Cox 
regression (see Table 3). 

Sokal score classification is found ambiguous. While the 
score should identify high- and low-risk CML patients, 
it seems to be predictive only for the failure due to other 
causes than CML. The use of the Sokal score should be 
considered more thoroughly. 
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Abstract 

The article deals with a family of diversity measure functions known as traditional measures of diversity. We deat 
with sampte estimates of traditional measures of diversity, we develop a new estimator and compare its behavior to 
two well established estimators in a simulation study. We also introduce a function that can be ušed to evaluate the 
sensitivity of a given diversity measure to changes in a population. 

The páper witl be presented at the 7th Summer School on Computational Biology and witl be published in the 
conference proceedings. 
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Abstract 


This thesis analyzes current statě of use 
of biometrics in Computer security. It provi- 
des an overview of the most commonly ušed 
anatomicai-physioiogical and behavioral bio- 
metric identification methods. The result of the 
Work will be a new set of methods, which ailows 
reliabte identification of the user in the most 
comfortabie way. These new principles of data 
security witi be ušed to enhance the protection of 
specialized health record. This witi contribute to 
expansion of generatiy conceived EHR MUDR 
concept to other application areas. 


1. Introduction 

Biometrics, biometric identification and verification 
háve been investigated since the early 80’s of the last 
century. At the end of the 20th century first applications 
began to emerge, especially in forensic practice where 
biometrics was represented by automated processing of 
fingerprints and palm prints found at a crime scene. 
Nowadays, biometric methods are irreplaceable both in 
the forensic Sciences and in commercially available ap¬ 
plications. 

In this páper we analýze current statě of use of biomet¬ 
rics in Computer security, especially the possibilities of 
identification based on biometric data. Biometric cha- 
racteristics can be divided into anatomicai-physioiogical 
and behavioral. 


2. Anatomicai-physioiogical biometric characteris- 
tics 

The most frequently ušed anatomicai-physioiogical bi¬ 
ometric characteristics in common practice are finger¬ 
prints, palm prints, geometry of hand shape and scan- 
ning of bloodstream patterns of the palm or the back of 
one’s hand. 

2.1. Fingerprints and palm prints 

Fingerprints and palm prints are based on the uniqueness 
of ridge patterns. Miniaturization of sensors and proces- 
sors ailows the fingerprint-based biometric identification 
for large commercial use. 

In practice, fingerprints are often ušed for authentication 
of persons accessing to computers or communication de- 
vices, for enhancement of protection of identification or 
credit cards, for authorization to access buildings and for 
protection of precious or dangerous devices from unau- 
thorized use. 

Interactive fingerprinting, which is now often implemen- 
ted in a variety of technical equipment, is doně by means 
of sensors. These sensors may be contact or contactless 
and their functions can be based on different physical 
principles [2]. 

2.1.1 Contact fingerprint sensors: Contact 

sensors include optical, electronic, optoelectronic, ca- 
pacitive, pressure and temperature sensors. Some of 
these sensor types will be described in detail below. 
Main advantages and disadvantages of each method are 
clearly shown in Table 1. 
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Sensor 

Advantages 

Disadvantages 

Optical contact sensors 

very quick 
user-friendly 

not resistant to dirt 
not hygienic 

don’t recognize living tissue 

Electronic contact sensors 

resistant to dirt 
very quick 
user-friendly 

not hygienic 

don’t recognize living tissue 

Capacitive contact sensors 

very quick 

not resistant to dirt 
don’t recognize living tissue 
not hygienic 

Temperature contact sensors 

recognize living tissue 
very quick 

not hygienic 

Optical non-contact sensors 

resistant to dirt 
hygienic 
very quick 

don’t recognize living tissue 

Ultrasonic non-contact sensors 

resistant to dirt 
hygienic 
very quick 

don’t recognize living tissue 


Table 1: Comparison of contact and non-contact fingerprint sensors. 


Optical contact sensors: Optical sensors are based 

on FTIR technology (Frustrated Total Internal Re- 
flection). This means that a laser beam illuminates the 
bottom surface of a finger that touches a transparent sen- 
sor plate. Reflected light flux is then captured by a CCD 
(Charge-Coupled Device) element. The amount of re¬ 
flected light depends on the depth of papillary lineš and 
furrows. Papillary Unes reflect more light than furrows. 

Other optical sensors use a thick bundle of opti¬ 
cal fibers that are perpendicular to the plane of the 
sensor. Flere again, the method of exposure and re- 
flection of light flow is applied. Another type of sensors 
uses CMOS technology (Complementary Metal-Oxide- 
Semiconductor). 


shape of the electric field changes too. The upper plate 
of the sensor is represented by surface of the skin that is 
connected to the source reference electrical signál. 

The main advantage of this sensor is that it does not scan 
only the surface of the skin but it scans deeper skin la- 
yers too. This means that this type of sensor is resistant 
to dirt and possible damage of the skin surface. 

Optoclcctronic contact sensors: Optoelectronic 

sensors consist of two layers. The upper layer is in con¬ 
tact with the skin and it is able to emit light. This light is 
captured in the second glass layer in which photodiodes 
are sealed. These photodiodes convert the light into an 
electrical impulse. 


Electronic contact sensors: Electronic sensors ope¬ 

ráte on the principle of electric field between two pa- 
rallel, conductive and electrically charged plates (see 
Figuře 1). If the shape of the originally fiat plate on 
top changes to wavy (papillary lineš and furrows), the 



Capacitive contact sensors: Capacitive sensors cap- 
ture fingerprint by measuring electrical capacity (see Fi¬ 
guře 2). Scanning sensor is composed of a large number 
of scanning surfaces that are isolated from each other. 
By touching the sensor, papillary lineš bridge the con¬ 
ductive pads while furrows act as isolators. The shape 
of papillary drawing, therefore, modulates voltage and 
capacitance drops between the conductive pads. These 
drops are measured and they form a digitalized picture 
of papillary drawing. 

These sensors are highly nonresistant to various types of 
dirt that may significantly alter conductivity of the skin. 


Figuře 1: Simplified diagram of the electronic sensors (ac- 
cording to [8]). 


Pressure contact sensors: Pressure sensors respond 

to a pressure of papillary lineš on the surface of sensor. 
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Figuře 2: Simplified diagram of the capacitive sensors (ac- 
cording to [8]). 

The sensor surface is made of an elastic piezoelectric 
materiál that transforms the pressure into an electrical 
signál and thus creates a picture of fingerprint. 

Temperature contact sensors: Temperature sensors 
react to temperature differences between papillary lineš 
and furrows. A great advantage of this sensor is that tem¬ 
perature is an important factor that can telí whether the 
scanned fingerprint belongs to a living person. 

2.1.2 Contactless sensors for fingerprint: 

The best-known groups of non-contact sensors include 
optical and ultrasound sensors. The main advantages 
and disadvantages of these sensors are also included in 
Table 1. 

Optical non-contact sensors: The principle of op¬ 

tical non-contact sensors is similar to the optical contact 
sensors described above with only one difference. The 
beam of light allows scanning from a distance of 3-5 cm. 

The greatest advantage of this sensor is that it prevents 
contamination caused by contact with dirty fingers. 



Figuře 3: Simplified diagram of the ultrasonic sensors (ac- 
cording to [8]). 

Ultrasonic non-contact sensors: Ultrasonic sensors 
are also based on a similar principle as the optical ones 


but instead of a light beam a beam of short mechani- 
cal waves (ultrasound) is being reflected from the skin 
surface (see Figuře 3). This type of sensor eliminates all 
the disadvantages of previous types of sensors explained 
above [1]. 

2.2. Geometry of hand shape 

Another frequently ušed method is the geometry of hand 
shape, the essence of which is measurement of lengths 
and widths of fingers, bones or joints of the hand (see 
Figuře 4). The hand touches a horizontál scanner that 
has speciál fixation pins. These ensure that the hand is 
always in the same position. The scanner captures one 
image from the top (perpendicularly to the sensor bo- 
ard) and one image from the side. This generates two 
monochrome images of 'hand silhouette’. 



Figuře 4: The basic principle of hand geometry (according 
to [8]). 

At first, a User requiring evidence of his identity enters 
his or her identification number (PIN) via keyboard or 
he or she touches a magnetic stripe, a chip or a card to a 
reader. Then the user puts his or her hand to a specified 
position according to visual instructions that are on key¬ 
board on the scanner [5]. Hand geometry scanners are 
now common in many areas including healthcare. 

2.3. Scanning of the bloodstream of the palm or the 
back of hand 

Another method suitable for use in Computer security is 
scanning of the bloodstream of the palm or the back of 
one’s hand. A CCD camera, which is most commonly 
ušed in this čase, takés a picture of the hand and a spe- 
cific pattern of blood vessel distribution captured in the 
image is then ušed to identify the person. 

An unquestionable advantage of this method is that it 
also verifies whether the tested object is alive. The scan¬ 
ning runs in infrared band which is sensitive to tempera¬ 
ture. This method takés advantage of the fact that blood 
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vessels in the body are warmer than their surroundings. 
The scanned image is further processed in a similar way 
as fingerprint (with the shape of vessels being compa- 
red). 

Another advantage in comparison to scanning of hand 
geometry is that it is not necessary to plače a hand in the 
scanner in the same position every time. 

Other options for this method are to scan the blood- 
stream of the palm or to perform non-contact scanning 
of both the palm and the back of hand, which provides a 
high level of hygiene unlike hand geometry scanning or 
fingerprints [5]. 

2.4. Scanning of face and its parts 

Instead of hands a face or a part of the face can be ušed 
to identify a person as well. There are Computer pro- 
grams that can recognize human faces like human brain 
does. Face recognition is now typical especially in cri- 
minology and there are many different methods and al- 
gorithms ušed for these purposes. 

This method can also be easily ušed to secure common 
computing and telecommunication systems. Any stan¬ 
dard video camera, which can be already found integra- 
ted in many screens, is sufficient to také the image of 
the face. The face scan can replace traditional password 
entry. A great advantage of this method is that there is 
absolutely no need for direct contact between the user 
and the sensor [10]. 

However, face recognition can be further improved in 
many ways. As an example, we can register signs of 
emotions. 

An interesting application of this method in IT secu¬ 
rity suggests itself. Continuous face scanning during the 
Work with Computer would make it possible to evaluate 
whether it is still the same person accessing sensitive 
data. Not only that this method secures the systém at 
the time of login, it can even protéct the data later on, 
when the authorized user, for example, leaves the un- 
locked terminál for a period of time. 

2.5. Scanning of iris or retina 

Recently, thanks to its simple implementation using only 
conventional video systems, scanning of iris or retina is 
becoming a more widespread method of identification. 
Iris recognition is possible regardless of size, location 
and orientation but it requires a complicated algorithm. 
This method is, therefore, usually ušed only to ensure a 
high level of security [5]. 


A light beam is ušed to map the bloodstream in the re¬ 
tina. A part of the beam is absorbed by the retina while 
the other part is reflected. Speciál camera, that is requi- 
red for the scanning, is expensive and the scanning pro- 
cess itself is not very user-friendly (many people are 
afraid of the technology) [5]. 

3. Behavioral biometric characteristics 

Keystroke dynamics could be an interesting behavioral 
biometric characteristic for use in Computer security not 
being widely ušed so far. 

3.1. Keystroke dynamics 

Keystroke dynamics allows so-called continuous (dyna- 
mic) veriíication, which is based on the use of keyboard 
as a medium of continuous interaction between user and 
Computer. This offers a possibility of continuous cont- 
rol over the whole time the Computer is being ušed. This 
method is useful in situations when there is a risk of lea- 
ving a Computer without control for a while [3]. 

The most common characteristic is the time of pressing 
individual keys or the duration of the keypress. Ano¬ 
ther possibility is to measure typing speed, frequency 
of errors, style of writing Capital letters or a force ušed 
to press the keys. This latter type requires a speciál key¬ 
board that allows the force of the push to be measured. 
All other methods can be evaluated by a speciál program 
without any modification of hardware [4,6]. 

4. Comparison of the methods 

Most of current data security systems verify useťs au- 
thorization to access the systém only at the time of lo¬ 
gin. In the čase that the question of user identification 
is solved only on the basis of biometric data, only one 
biometric component (or just a few of them) is ušed for 
veriíication in most cases. 

A solution should preferentially include the methods 
mentioned in the introduction and emphasize those of 
them, which will prove themselves long-time stable or 
the least disturbing for staff. The method must be fast 
enough for the user. Hardware requirements and requi- 
red Processing power will also be considered. 

Table 1 shows the main advantages and disadvantages 
of different types of contact and contactless sensors for 
fingerprinting. All sensors for fingerprinting are relati- 
vely quick and easy in comparison to other biometric 
methods. 
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Sensor 

Advantages 

Disadvantages 

Geometry of hand 
shape 

resistant to dirt 

don’t recognize living tissue 

require scanning in the same position 
not hygienic 

Contactless scanning 
of bloodstream 

don’t require scanning in the same 
position 

recognize living tissue 

hygienic 

resistant to dirt 

no possibility of continuous control 

Scanning of the face 

resistant to dirt 

recognize living tissue 

don’t require scanning in the same 

position 

possibility of continuous control 

time-consuming 

Scanning of iris 

resistant to dirt 

don’t require scanning in the same 

position 

user-friendly 


Scanning of retina 

resistant to dirt 

don’t require scanning in the same 
position 

not user-friendly 
time-consuming 

Keystroke dynamics 

user-friendly 

possibility of continuous control 
hardware-efficient 



Table 2: Comparison of anatomical-physiological and behavioral biometric characteristics. 


The main differences are in resistance to dirt, which is 
important for the following two reasons. The hrst one is 
that the sensor should be able to work even when there 
is dirt on its surface or on the surface of the finger that 
is being scanned. The second reason is, of course, the 
hygienic aspect. 

The greatest benefit is sensoťs ability to distinguish li- 
ving tissue from dead or synthetic materiál. Then it be- 
comes very resistant to possible abuse. 

Table 2 displays main advantages and disadvantages of 
other anatomical-physiological and behavioral charac- 


5. Application of selected methods in eiectronic he- 
alth record security 


The aim of this work is to propose a multifactor systém 
that will verify a number of biometric features simul- 
taneously, thus ensuring greater reliability of identifi- 
cation. This will protéct access to patient data in electro- 
nic record personál Identification ERPI, which is con- 
ceptually based on the proposal of Universal Electronic 
Health Record MUDR, see [7]. 


teristics. Besides the aspects mentioned above, we com- 
pared also the possibility of continuous authentication, 
the need for scanning in the same position and dif- 
ficulty/ease of use. 

Table 3 compares selected methods in terms of stabi¬ 
lity of biometric characteristics and time-consumption. 
Data in the table are not accurate readings but empi- 
rical estimates. The table shows that there is no me- 
thod that would be "ideál”, i.e. would offer high stability 
of biometric characteristics and low time consumption. 
Iris scanning, which is currently not ušed in everyday 
practice, is close to this ideál. 

Security of patient data is one of the key issues in tele- 
medicine. It may appear that this is a standard solution 
using the principles of eiectronic record EHR MUDR. 
But unlike our task, the concept of EHR MUDR record 
is designed with respect to ordinary patient data, acces- 
sed during everyday hospital operation. 

Contrastingly, in the čase of the eiectronic record of 
personál Identification ERPI, there will be much more 
sensitive data related to the identification of individuals 
from different perspectives. Eor this reason there is also 
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Method 

Stability of biometric characteristics 

Time-consuming 


high = more than 80 %, 
medium = more than 60 %, 
low = less than 60 % 

high = more than 3 sec, 
medium = less than 3 sec, 
low = less than 1 sec 

Eingerprint 

medium 

low 

Geometry of hand shape 

medium 

medium 

Scanning of bloodstream 

medium 

medium 

Scanning of the face 

low 

high 

Scanning of iris 

high 

medium 

Scanning of retina 

high 

high 

Keystroke dynamics 

low 

low 


Table 3: Comparison of methods in terms of stability of biometric characteristics in and time-consuming. 


a demand for higher level of identification of persons 
accessing the data. 

With regard to the nátuře of such data it appears ne- 
cessary to use some set of DLP (Data Loss Prevention) 
allowing identification of the risks associated with the 
loss of sensitive data and possible dynamic reduction of 
these risks. Moreover, with regard to the type of sensi¬ 
tive identification data it is useful to háve a resource that 
will allow consecutive audit of the data. 

Commercial Solutions such as RSA or Websense are 
available. These sets are designed to reduce the impact 
of potential risks, irrespective of whether the data are 
stored in the datacenter, transmitted over the network 
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Abstract 

We study some basic properties of Hilbert- 
style propositional calculi with the rule of con¬ 
densed detachment instead of modus pones and 
substitution. The rule of condensed detachment, 
proposed by Carew A. Meredith, can be seen as 
a version of modus ponens with the “minimal” 
amount of substitution. 

1. Introduction 

Hilbert-style calculi for various propositional logics 
has been studied by prominent logicians, including 
Lukasiewicz and Tarski, constituting historically a well- 
established branch of mathematical logic. These calculi 
are usually equipped with the rules of detachment, we 
sbalí prefer call it modus (ponendo) ponens, and substi¬ 
tution.* One of the logicians who significantly contribu- 
ted to the study of such calculi was Carew A. Meredith. 
In the 1950’s, he proposed, cf. [1], the rule of conden¬ 
sed detachment as a rule which combines modus ponens 
with a “minimal” amount of substitution, cf. [2]. 

The generál idea behind the rule of condensed deta¬ 
chment is that from two formulae ip ^ ip and x, such 
that there is a most generál unifier a of (p and x, derive 
(j(V’). However, this brief version does not contain some 
important technical details which will be discussed later 
in the páper, see Deíinition 2.1. 

The use of unification in the deíinition of condensed de¬ 
tachment suggests its connection with binary resolution, 
cf. [3]. However, the originál formulation did not use 
unification, which was proposed by Robinson [4] in the 
1960’s. There is also a very tight connection with com- 
binatory logic, cf. [2]. 


It is usually claimed that one of the main advantages 
of condensed detachment over the rules of modus po¬ 
nens and substitution is an economic presentation of 
proofs. The reason is that the result of application of 
condensed detachment is unique (up to variable rena- 
ming) and a proof can be presented as a sequence of 
axioms, there is no need to write substitutions. In this 
páper we try to discuss some interesting questions which 
arise if we replace the rules of modus ponens and substi¬ 
tution in Hilbert-style propositional calculi solely by the 
rule of condensed detachment. Although condensed de¬ 
tachment may seem as a toy tool, there are some rather 
interesting applications e.g. in proof complexity [5], see 
Section 3.2. 

The páper is organised as follows. In Section 2 we de- 
fine some basic notions including the rule of condensed 
detachment. In Section 3 we prove Theorem 3.1 which 
connects proofs using the rule of condensed detachment 
and proofs using the rules of modus pones and substi¬ 
tution. Also the uniqueness of application of condensed 
detachment concerning the number of different formu¬ 
lae provable from a finite set of axioms by proofs of 
some maximal given length is discussed in Section 3.1. 
In Section 4 the notion of D-completeness of a set of 
axioms A, which means that the very same formulae are 
provable by condensed detachment as by modus ponens 
and substitution in A, is studied and some basic proper¬ 
ties are proved. 

We would like to notě that the most of the results in this 
páper, although mainly (re)discovered independently, 
are implicitly or explicitly discussed in several papers 
on condensed detachment, cf. [2,3,6]. These papers also 
influenced the presentation given here. 


Since axiom schémata are sometimes ušed instead of axioms, the rule of substitution is in these cases only implicitly presented. 
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2. Preliminaries 

We fix a countably infinite set of variables Var = 
{p,q,r,...}. The set of formulae Fml is defined in the 
standard way: any variable from Var is an element of 
Fml, if 6 Fml then also (</? —> i/’) 6 Fml and 
nothing other is a member of Fml. Hence the only con- 
nective we are interested in is the implication. The re- 
ason for this is that all the things we want to discuss 
become apparent already in implication fragments. We 
usually denote formulae by ip, tp, and x- The outermost 
brackets are mostly omitted. 

A substitution cr is a function a: Var —> Fml. We say 
that a substitution cr is a renaming if a: Var —> Var is 
a bijection. The result of an application of a substitution 
(j on a formula p, denoted is the formula obtained 
by replacing variables in p according to a simultane- 
ously. A composition of substitutions a: Var —> Fml 
and 5: Var —> Fml is a substitution a o S = { (p,'4>) \ 
{3ip'){{p,ip') 6 cr and ■)/) = <5(r/;'))}. The empty sub¬ 
stitution is denoted e = { (p,p) | P € Var}. In this 
páper substitutions are denoted a, S, 6, p, and (. Instead 
of using ordered pairs we write a substitution as a set of 
pairs p/ip, usually writing only the important one, mea- 
ning the substitution is defined as the empty substitution 
on the other variables. 

A formula ip is a variant of a formula p, abbreviated by 
r/) ~ p, if there is a renaming cr such that ip = cr(p), i.e. 
p = a~^(ip). Moreover, we say that a substitution cr is 
a variant of a substitution 5 if there is a renaming d such 
that a = S o 9, i.e. S = a o 9~^. 

A unification of a set of formulae F = {pi,... ,p„} is 
such a substitution cr that <j{pi) = ■ ■ ■ = a(p„). If such 
a substitution exists we say that F is unifiable. Due to 
the Unification Theorem of Robinson [4], for any unifi¬ 
able set of formulae F there exists a most generál unifier 
of F. A most generál unifier (m.g.u.) cr of F is such a 
unification that for any other unification 5 of F, there is 
a substitution 9 such that ao9 = S. All the most generál 
unifiers, if they exist, are the same up to renaming, they 
are variants of each other. Since this difference will be 
unimportant for us we shall write the m.g.u. instead of a 
m.g.u. 

2.1. Hilbert-style calculi 

In this páper we study Hilbert-style propositional cal¬ 
culi. A Hilbert-style calculus consists of a set of axioms 
A, which is just a set of formulae, and deduction rules. 
The following axioms are discussed in the páper: 

(B) (p^q)^ {{r ^p)^ (r^ q)). 


(B') (p^ q)^ {{q -^r) r)), 

(C) {p-t {q-t r)) -t {q-r {p-r r)), 

(I) p-tp, 

(K) p^{q^ p), 

(W) {p-t {p-r q)) -t {p-r q), 

(P) ((p -t q) -t p) -r p. 

The names of axioms are based on corresponding com- 
binators in combinatory logic, with the exception of (P) 
which stands for Peirce’s law. We can present a set of 
axioms listing the axioms it contains, e.g. BCK denotes 
the set containing (B), (C), and (K). 

We shall use only three deduction rules: modus ponens, 
substitution, and condensed detachment. The rule of mo¬ 
dus ponens (or detachment) derives ip from p ^ ip and 
p. The rule of substitution derives (j{p) from p for any 
substitution cr. 

Definition 2.1 (Condensed Detaehment) Let us háve 
two formulae p ^ ip and x- lke produce a variant of 
X called x', which does not háve a common variable 
with p ^ ip. If there is the m.g.u. a of p and x' , then 
produce a variant a' of a such that no new variable in 
cr'{p) occurs in ip. The condensed detachment of p —> 
and X, denoted D[p — > ip)x> ^ cr^('*/’)■ Otherwise, the 
condensed detachment of p ^ ip and x is not defined. 

Notě. For technical reasons it is sometimes useful to de- 
fine condensed detachment not only for formulae conta¬ 
ining implication but also for variables. In this čase, the 
condensed detachment of p, which is a variable, and x, 
is defined as p, cf. [2]. 

Remark. It is evident that the condensed detachment of 
p and ij} is defined uniquely up to variants (renaming). 
Thus we shall write that Dp ~ x- When the rule of 
condensed detachment is the only rule we shall also so¬ 
metimes write píp X- 

As Definition 2.1 is quite technical, we discuss the 
whole process of an application of condensed deta¬ 
chment in details. First, we produce a variant x' of x 
with no common variable with p ^ ip. To see why, con- 
sider p = p ^ p and x = P- there would be no unifi¬ 
cation of p —> p and p. Moreover, if we had p = p —> p, 
ip = q ^ q and x — Q the condensed detachment of 
íp —> V’ atttl X would be {p ^ p) ^ {p ^ p). 

Another important technical aspect is that the defini¬ 
tion requires to produce a variant a' of a (notě that 
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a' is also the m.g.u. of (p and -i/)') which satisfies 
{Var{a' {íp)) \ Var{p)) fi Varípli) = 0. If this condi- 
tion was not satisfied we would get a result that would 
not be the most generál one. 

A proof of in j 4 is a finite sequence of formulae 
lpi,, ipn, where ipn = p, with the following proper- 
ties. Every element is a member of A or is derived from 
the preceding elements of the sequence by a deduction 
rule. In this páper we study MP-proofs which háve mo¬ 
dus ponens and substitution as their only deduction ru- 
les, and D-proofs which háve condensed detachment as 
the only deduction rule. 

If there is a D-proof (MP-proof) of 93 in A we say that 
p is D-provable (MP-provah/e) in A. Since we already 
pointed out that the result of an application of conden¬ 
sed detachment is unique up to variants we mostly do 
not mention that if p is D-provable in A then also all the 
variants of p are D-provable in A etc. 

It is worth to point out that all the MP-provable formu¬ 
lae in BCI, BCK, BCKW, and BCKWP correspond to 
logics BCI, BCK, the implicational fragment of intui- 
tionistic propositional logic, and the implicational frag¬ 
ment of classical propositional logic, respectively. 

Example 2.1 We prove I in CK by condensed deta¬ 
chment. The proof can be described as (CK)K, which 
means that we use condensed detachment on C and K 
and then on the result and K. 

Since C = {p ^ {q ^ r)) ^ {q ^ (p ^ r)) and 
K = p ^ (q ^ p), we produce a variant of K e.g. 
s —> (í —> s). There is the m.g.u. a = {r jp, sjpyt!q} 
of p ^ [q ^ r) and s ^ {t ^ s), which satisfies 
that no new variable in a(p —> (5 —> r)) occurs in 
q ^ {p ^ r). Itfollows that CK ~ a{q —> (p —> r)) = 
q-t ip-t p)- 

Now we can use q ^ {p ^ p) and any provable for- 
mula, e.g. K, to prove I. We produce a variant of K 
e.g. again s ^ {t ^ s). There is the m.g.u. r = 
{q/s —> (ť —> s)} of q and s ^ {t ^ s). Moreo- 
ver, s and t does not occur in p ^ p. It follows that 
(CK)K ~ t(p —> p) = P ^ P- 

3. Condensed detachment 

It is obvious that condensed detachment can be sim- 
ply simulated by modus ponens and substitution. As the 
idea behind the rule of condensed detachment is to be a 
version of modus ponens equipped with the “minimal” 


amount of substitution, we would expect that there is 
also some connection in the other direction. This con- 
nection was probably hrst explicitly showed in [3] by 
Kalman. 

Theorem 3.1 Let Abea set ofaxioms and V be an MP- 
proof in A. Then there is a D-proof V' in A such that 
every step in V is a substitution instance ofa step in V'. 
Moreover, V' is not longer than V. 

Proof: By induction on the length of the proof V. If 
V = fii then -01 6 A and hence V' = - 01 . Assume that 
the claim holds for n and we shall prove it for n -|- 1 . 
It means we háve an MP-proof V = 0i,..., ■0„, ■0„+i 
and D-proof V" = -0^,..., 0^, where m < n, corre- 
sponding to the MP-proof V* = 0i,..., 0„ as the theo¬ 
rem says. If 0„+i 6 A then P' = 0i/ ,..., 0„+i, 

or V' = V" if 0n+i already occurs in V", and the claim 
holds trivially. Otherwise 0 „+i is derived by some de¬ 
duction rule from V*. Both deduction rules are discus- 
sed separately. 

First, 0n+i is derived by the rule of substitution from 
I < i < n. It means that there is a substitution 
cr s.t. 0„+i = cr(0i). There is a formula 0' 6 V", 
1 < j < i, and substitution 9 s.t. 0^ = 9{'tpl). It me¬ 
ans that 0„+i = 9 o ťT(0') and V' = V". 

Second, 0„+i is derived by tbe rule of modus ponens 
from 0i and ipj, 1 < i < j < ri. For the saké of genera- 
lity lpi = ipj —> 0„+i. There are formulae 0jj,, 0; € V", 
1 < k,l < j, formulae <p, 0, and substitutions 9 and 
t] s.t. 0i = 9{ip'i.) = 9{p) 9{ip) and ipj = ??(0(). 

We produce a variant 0" of ip[, wbicb does not háve a 
common variable with p and 0. Since 9{p) = ? 7 ( 0 ;') 
there is the m.g.u. C of p and 0". We produce a vari¬ 
ant 0 of 0 s.t. (Var(0'(p)) \ Var{p)) n Var{ip) = 0. 
ThusP' = 0Í,..., 0^, 0'(0) and there is t s.t. 0n+i = 
0(0) = 0'ot( 0) = r(C'(0)). ■ 


Corollary 3.2 Let p be a formula and A be a set of 
axioms. Then p is MP-provable in A iff there is a for¬ 
mula 0 and substitution a s.t. 0 is D-provable in A and 
a{'ip) = p. 

Notě. It is easy to transform any MP-proof V to another 
MP-proof V such that all the substitutions occur before 
any application of modus ponens. Tbeorem 3.1 can be 
from a certain point of view understood as an attempt to 
produce an MP-proof V” where modus ponens occurs 
before substitution as mucb as possible. 
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3.1. Proofs with a given length 

In Hilbert-style calculi with only finitely many axioms it 
is hard to enumerate explicitly all the formulae provable 
in a given number of steps, because there are in generál 
iníinitely many substitution instances. Our situation is 
completely different, there are only finitely many such 
provable formulae (up to variants) if we use only con¬ 
densed detachment, námely: 

Observation 3.3 Let | j4 | = m be a set of axioms and 
be the set of all formulae D-provable in A by pro¬ 
ofs with at most n steps, then |r^| is 0{m^ ) up to 

variants. 

This means that for a finite set of axioms A we can itera- 
tively generate all formulae provable in it. Thus if there 
is an MP-proof "P of in a finite A with at most n steps 
then there is by Theorem 3.1a D-proof P' oftpin A with 
at most n steps such that there is a substitution a such 
that <j{f)) = if. Since there is a finite upper bound on 
the number of all possible see Observation 3.3, and 
we can easily test whether there is such a substitution a 
for given and p, we can produce a proof V' in finite 
time. Moreover, we can find all such tp, there are only 
finitely many up to variants, and all D-proofs V' of tp in 
A not longer than n. Among them, there is also some ip' 
and its D-proof V" in A, from which we can construct 
an MP-proof V'" oí p m A with at most n steps. This 
way we can show that there is no MP-proof of in a 
finite A with at most a given number of steps. 

3.2. An application of condensed detaehment in 
proof complexity 

Urquhart in [5] proves a lower bound on the length of the 
proofs in Hilbert-style calculi for classical propositional 
logic with the rules of modus ponens and substitution, 
called substitution Frege systems in proof complexity. 
There are tautologies of length 0{n), for sufficiently 
large n, which require proofs with steps. The 

proof is based on the connection between MP-proofs 
and D-proofs via Theorem 3.1. 

4. D-completeness 

Although we know that there is a tight connection for 
a given set of axioms A between MP-provable formu¬ 
lae and substitution instances of D-provable formulae, it 
does not mean that any MP-provable formula is also D- 
provable (up to variants) without the use of substitution. 
On the other hand, it does not either mean that there is a 
MP-provable formula which is not D-provable. To ela¬ 
boráte this problém we define a notion of D-complete 
set of axioms. 


Defínition 4.1 Let A be a set of axioms and T be the 
set of all formulae MP-provable in A. We say that A is 
D-complete ifall the formulae in T are D-provable in A. 

Theorem 3.1 says how the sets which are not D- 
complete look like: 

Observation 4.1 Let A be a set of axioms then A is not 
D-complete ijf there is a formula p and substitution a 
s.t. p is D-provable in A, but (t{p) is not. 

The essential question is whether such a bit strange no¬ 
tion of D-completeness makes sense at all. However, 
in [2] Hindley and D. Meredith show that BCI and BCK 
are not D-complete, but BCKW and BCKWP are D- 
complete. 

Defínition 4.2 Let p be a formula MP-provable in a set 
of axioms A. We say that a formula p is basic w.r.t. A 
if there is no formula ip MP-provable in A and non- 
renaming substitution a s.t. p = We say that a 

set of formulae V is basic w.r.t. A if all p eV are basic 
w.r.t. A. Moreover, we say that a set of axioms A is basic 
if A is basic w.r.t. A. 

Notě. For any formula p MP-provable in A, there is 
a formula f; basic w.r.t. A and a substitution a s.t. 
p = <y{yp). However, such a formula need not beunique: 
formula ({p ^ p) —> p) —> p is a substitution instance 
of ((? —> r) —> g) —> <7 or ((g q) ^ r) ^ r. 
Both these formulae are basic w.r.t. any set of axioms 
complete for classical propositional logic. 

Lemma 4.2 Let A be a set of axioms and p be a for¬ 
mula basic w.r.t. A. Then p is D-provable in A. 

Proof: From Theorem 3.1 it follows that there is a for¬ 
mula tp D-provable in A and substitution a such that 
cr{tp) = p. Since p is basic in A, a is renaming and 
consequently ip p. ■ 

We say that two sets of axioms Ai and A 2 are MP- 
equivalent if they háve the same sets of MP-provable 
formulae. 

Theorem 4.3 Let sets of axioms Ai and A 2 be MP- 
equivalent. If Ai is D-complete and basic, then A 2 is 
also D-complete. 
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Proof: Let 9 ? be a formula MP-provable in A 2 . Then 
93 is MP-provable in Ai, and consequently also D- 
provable in Ai, by the D-completeness of Ai. Since Ai 
is basic w.r.t. Ai, and thus it is basic w.r.t. A 2 as well, all 
the formulaein Ai are D-provablein A 2 , by Lemma4.2. 
Therefore we can transform any D-proof of 93 in yli into 
a D-proof of 93Ínj42. ■ 

Notě. In [ 6 ], three MP-equivalent sets of three axioms 
are presented, BB'I among them, but only one of them 
is D-complete. Hence BB'l is not D-complete by The- 
orem 4.3, because BB'I is basic. Moreover, the two re- 
maining sets differ only in one axiom, and the one from 
the D-complete set is a substitution instance of the other 
one from the set which is not D-complete. Although it 
may look a bit surprising it holds generally. 

Corollary 4.4 If a set of axioms A is not D-complete 
then there is no set of axioms A' MP-equivalent to A, 
D-complete, and basic. 

As we already know about BCI, BCK, and BB'l that 
these sets are not D-complete, we know that there are no 
D-complete and basic sets of axioms MP-equivalent to 
them. 

On the other hand, Theorem 4.3 has mainly a positive 
meaning. We can easily check that BCKW and BCKWP 
are basic. It means that any set of axioms which is to- 
gether with modus ponens and substitution complete for 
the implicational fragment of intuitionistic logic or clas- 
sical logic, respectively, is also D-complete. 

The following lemmata, especially the second one, are 
very useful to prove that some set of axioms is D- 
complete. They say that not even all the instances of 
axioms are D-provable in sets of axioms which are not 
D-complete. 

Lemma 4.5 Let A be a set of axioms. All the substitu¬ 
tion instances of axioms in A are D-provable iff A is 
D-complete. 

Proof: Any MP-proof V can be transformed to an 
MP-proof V' where all the substitutions occur before 
any application of modus ponens, and modus ponens 
can be easily simulated by condensed detachment. The 
converse direction follows from the definition of D- 
completeness. ■ 


Lemma 4.6 ([6]) Let A be a set of axioms and ip ^ p 
be D-provable in A for any formula 93 . Then A is D- 
complete. 

Proof: For any 93 MP-provable in A, there exists 1 /’ s.t. 
tp is D-provable in A and 93 is a substitution instance of 
Tp. From the provability of ip and 93 —> 93 we immedia- 
tely obtain that 93 is provable by condensed detachment. 


Notě. The fact that A contains I and all the instances 
of other axioms are provable does not mean that A is 
D-complete. Let A = { ((93 —> 93 ) —> 93 ) —> 93 | 
93 is a formula } U {p —> p}. Then A is not D-complete 
since only formulae in A are provable. 

It is evident that for any set A there exists its super- 
set A' = {93 j 93 is MP-provable in A } which is D- 
complete and háve the same MP-provable formulae as 
A. However, such a set is infinite even for a finite 
A, if A 0. Moreover, there is a finite set A, ná¬ 
mely A = I, which does not háve a finite superset 
MP-equivalent to A. 

Theorem 4.7 There is no finite set of axioms A which is 
D-complete and MP-equivalent to I. 

Proof: Assume that such a set A = { 931 ,..., 93 „}, 
consisting only of substitution instances of p —> p, 
exists. Since our setting is very speciál, we show that 
any D-proof in A can be transformed to an equivalent 
D-proof in A, proves the same formula, with very spe¬ 
ciál properties. 

The condensed detachment of 93 —> 93 and tp is ( 7 ( 93 ) = 
(j(tp'), for the m.g.u. cr of 93 and ff, which is a sui- 
table variant of tp. The key point is that a formula which 
is the result of unification of 93 and ip' is itself the re- 
sult of condensed detachment. Let tp: xi, ■ ■ ■, Xm mean 
D{... {D(D'ip xi) X 2 ) ■ ■ ■) Xm- Such a notation repre- 
sents a formula by presenting its proof. The following 
three statements hold. All of them can be proved by 
checking the properties of most generál unifiers and how 
the rule of condensed detachment behaves in our very 
speciál setting. 

1- tp- Xi,---,Xm is a variant of tp: x[,---,x'k^ 
where xí, • • •, Xfc’ for A: < m, contains exactly 
once all the members of xi, • • •, Xm ir> any order. 
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2. AU the following formulae are variants of each 
other: 


^1 

Xi,...,Xfc,(^2: Xk+i,- 

■ 5 Xm)í (1) 


('í/’2: Xi,- 

• ■ 5 Xm)^ 

(2) 


x'i,- 

• • : X/)i 

(3) 


where xí, • • • ,Xí> for I ^ contains exactly 
once all the members of xi, • • •, Xm in any order. 

3. -01: : ■■■{ipk - Xi,---, Xm) • • •) is a variant 

of ■■■W- Xi,---,Xm)'")> where 

■0^,..., '0;, for l < k, contains exactly once all 
the members of i/’i, • • •, ia any order. 

Consequently, any D-proof in A can be transformed to 
a D-proof : (í/í 2 : ■ ■ ■ {ipk - Xi, ■ ■ ■, Xm) ■■■), where 
k,m < n', if i < j, '4’i = > and ■í/>j = ípj/ then i' < j'\ 

and if i < j, Xi = and Xj = then i' < j'. The- 
refore there are only finitely many D-provable formulae 
in A up to variants. ■ 


5. Conclusion 

We presented the rule of condensed detachment and stu- 
died Hilbert-style propositional calculi in which it is the 
only deduction rule. We showed a connection between 
such calculi and more standard calculi with the rules of 


modus ponens and substitution. Although generally not 
all the substitution instances of axioms are provable by 
condensed detachment, there are sets of axioms in which 
this is true and we provided some observations on such 
calculi. 
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Abstrakt 

Kalmanův filtr, poprvé publikován ještě v 
šedesátých letech minulého století, je v dnešní 
době používán ve velkém množství aplikací, 
jako například při navigaci pomocí systému 
GPS, případně všude tam, kde není možné měřit 
bez přítomnosti rušivého šumu. Při jeho použití 
je však třeba mít uloženou v paměti počítače ko- 
varianční matici, což může být problém, když je 
tento filtr aplikován na prostorech obrovské di¬ 
menze. 

Jedním z řešení je ansámblový Kalmanův 
filtr, který byl navrhnut jako Monte Carlo 
aproximace původního Kalmanova filtru. Právě 
na vysvětlení tohoto řešení se zaměříme v 
tomto článku, přičemž budou v krátkosti a s 
příslušnými referencemi uvedena i jiná možná 
řešení. 

Navzdory faktu, že ansámblový Kalmanův 
filtr je hojně používán už od momentu jeho první 
publikace v roce 1994, až do nedávné doby 
chyběly studie jeho asymptotických vlastností 
a očekávané konvergence ke Kalmanově filtru. 
Protože tyto vlastnosti úzce souvisí s dimenzí 
prostoru, přivádí nás to na myšlenku přenést oba 
tyto filtry na prostor nekonečné dimenze. Tato 
práce bude součástí doktorského studia autora a 
v závěru tohoto článku jsou uvedeny v současné 
době řešené problémy s tímto související. 

1. Úvod 

Kalmanův filtr (KF) byl poprvé prezentován v 
šedesátých letech minulého století v článcích [8] a [9] 
jako možné řešení klasické statistické úlohy filtrace 
signálu a šumu. Dodnes patří k nejpoužívanějším filtrům 
a s jeho aplikacemi se můžeme setkat i při řešení řady 


moderních a populárních problémů, jako je navigace po¬ 
mocí GPS či příjem FM signálu. 

Dalším využitím KF je při tzv. asimilaci dat. Asimi¬ 
lace dat je statistická metoda na odhad skutečného 
stavu systému (typicky se jedná o dynamický systém 
vyvíjející se v čase) pomocí fúze různých měření s dis¬ 
tribucí nerovnoměrně rozloženou v prostoru a čase. 
Tento článek je strukturován následujícím způsobem. 
První část poměrně podrobně popisuje řešený problém, 
přičemž následuje popis řešení pomocí KF. Další část 
se zabývá problémy vznikajícími při užití KF na pro¬ 
storech příliš velké dimenze a řešením těchto problémů 
se zaměřením na popis ansámblového Kalmanova filtru. 
Rovněž jsou uvedeny některé další typy filtrů původně 
odvozených z KF. Poslední část článku klade dosud 
nevyřešené otázky ohledně přenosu KF na obecný Hil- 
bertův prostor, kterým se bude autor zabývat v průběhu 
svého doktorského studia. 

2. Definice a popis problému 

Předpokládejme, že v diskrétních časových okamžicích 

tl, - ■ ■ ,tK 

máme k dispozici vstupní data (například měření 
nějakých stavových veličin) ve tvaru m-rozměrných 
vektorů 

Dti,..., 

tj. platí Dt- 6 ffi*". Stav zkoumaného systému v jednot¬ 
livých časových okamžicích budeme popisovat pomocí 
vektorů 

Xtj,..., Xtjj. 

délky n. Uvědomme si, že stav systému v daném 
okamžiku je náhodný vektor. O jeho distribuci budeme 
předpokládat dvě základní věci: 
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• má hustotu na R" a 

• má omezený druhý moment. 

Vstupní data jsou se stavem systému přepojena pomocí 
observační funkce (operátoru) 

hu 

která se může v čase měnit ale je známá. Vývin jednot¬ 
livých stavů systému pak popisujeme pomocí funkce 

■M : (Xt.,ři,íi+i) —> 

a předpokládáme, že mají Markovskou vlastnost, to jest 
je splněna rovnost 

PP^tK |Xti, ■ . ■ , Xt^_ J = |Xt^_ J, (1) 

kde p{-) značí hustotu. V praxi je většinou funkce M 
zadána formou nějakého numerického modelu. Ze sta¬ 
tistického hlediska se pak často jedná, v jistém smyslu, 
o "černou skřínku”. Za těchto podmínek je naším cílem 
odhadnout skutečný stav systému v čase ík s použitím 
všech dat dostupných do času tn-i- K tomu využijeme 
Bayesovu větu, která tvrdí, že za námi deklarovaných 
podmínek platí 

p(X“) =p(x{|Dt) «p(Dt|x{).p(x{). (2) 

X{ se nazývá apriorní stav systému a X“ aposteriorní 
stav systému (v anglických meteorologických zdrojích 
se tento stav často označuje jako "analysis statě”). 

3. Kalmanův filtr 

KF předpokládá, že funkce M a jsou lineární, takže 
je možné napsat je ve tvaru 

+hti, (3) 

ht,(XtJ = Ht,Xt,+h0, 

a Hj. jsou matice rozměru n x n a m x n, bt. a 
h° jsou vektory délky n a m. Dále se předpokládá, že 
apriorní rozdělení stavu systému a podmíněné rozdělení 
vstupních dat jsou normální s nějakými regulárními ko- 
variančními maticemi Q{. a Rt. 

X{ 

Z Bayesovy věty (2) plyne, že aposteriorní rozdělení 
stavu zkoumaněho systěmu je taky normální. Pro 
vyjádření střední hodnoty a varianční matice tohoto 
rozdělení si definujme matici 

Ku = QiuliHuQilíl + RtJ-i (4) 


s jejíž pomocí je možně tyto charakteristiky spočíst po¬ 
mocí jednoduchěho vzorce 

t^l=,ii+KuiT>u--tluPii), (5) 

Q?^ = (I-Kt,H)Q(. (6) 

K odvození těchto rovností je potřeba jenom Bayesovy 
věty a splnění Markovskě vlastnosti, protože spojením 
( 1 ) a ( 2 ) dostáváme, že pro aposteriorní rozdělení stavu 
systěmu platí 

p(X“^) « • • •, J. 

Pokud tedy chceme v ěase U předpovědět stav systému v 
dalším kroku, KF nám poskytuje jednoduchý dvou kro¬ 
kový algoritmus: 

• první krok - urěení aposteriorního rozdělení 

fil=pil+Kui-Du-iiuf^O, 

Q“ =(I-Kt,HtJQ(, 

• druhý krok - předpověď 

+ bt,. 

QÍ^^=MlQlMu. 

Je třeba si uvědomit, že KF umožňuje, aby se matice 
Mt- měnila v čase. Tato vlastnost je potřeba, protože ve 
většině praktických problémů není funkce M lineární 
a členy v rovnosti (3) jsou nahrazeny nějakou lineární 
aproximací. Jednou z možností je aproximovat matici 
Mt- jakobiánem původní funkce M v bodě ti a vektor 
bt- hodnotou Af ( 0 ,ři,ři+i). 

Tím, že je KF znám a používán už mnoho desítek 
let, je možně najít informace o něm ve spoustě 
různých knížek ěi skript. Čtenářům, kteří se více 
zajímají o tento problém doporučujeme například 
knihu [ 12 ], případně další zdroje uvedené na stránce 
http://www.cs.unc.edu/ welch/kalman/, která posky¬ 
tuje pěkný souhrn současných znalostí o KF, včetně jeho 
asymptotických vlastností a rychlosti konvergence. 

Velkou výhodou KF je, že nám poskytuje přesně alge¬ 
braické vyjádření pro střední hodnotu (5) a kovarianční 
matici ( 6 ) aposteriorního rozdělení stavu systěmu. K 
těmto výpočtům je však potřeba nejprve spočíst matici 
Kt- pomocí (4). K tomu je však potřeba znalosti ma¬ 
tice Q{., rozměru n x n, kterou sice předpokládáme, 
nicméně v případě, že je n příliš velkě, nemusí být 
snadně (nebo dokonce možné) takto velkou matici uložit 
do paměti jakkoliv výkonného poěítače. Například při 
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předpovědi počasí pro Evropu se dnes běžně používá 
3D grid s velikostí horizontálního čtverce 10 km a s cca 
30-50 vertikálními hladinami. Při použití 6 stavových 
veličin je n rovné cca 5 x 10®. Při takhle velké dimenzi 
zatím neexistuje počítač schopný uložit do paměti ma¬ 
tice QÍ a Q“. a následně s nimi poěítat. 

4. Ansámblový Kalmanův filtr 

Ansámblový Kalmanův filtr (EnKF - z anglickěho 
názvu Ensemble Kalman filter) je jedno z možných 
řešení problěmu s velkou dimenzí stavového prostoru. 
Základní myšlenka je prostá, nahradíme kovarianční 
matice Q{ a Q“ výběrovými kovariančními maticemi. 
Tyto sice mají totožné rozměry, ale jak ukážeme později, 
není potřeba mít uloženy tyto matice celě v paměti 
počítače v žádném kroku výpočtu. 


standardní cestou 


Xí 


N 




/ 

íij’ 


J = 1 
N 




j=i 

Podobně jako jsme pro použití KE potřebovali znalost 
matice K^. (4), budeme pro použití EnKE potřebovat 
definovat matici 


EnKE pak získáme aplikováním rovností (5) a (6) na 
každý člen ansámblu, jen matici K^. nahradíme ma¬ 
ticí Et. a kovarianční matici C{. nahradíme výběrovou 
kovarianění maticí Q j.. Algoritmus výpočtu EnKF má 
tedy tvar 


Abychom byli schopni spočíst výběrovou kovarianční 
matici budeme muset, v každěm čase U, pracovat s 
náhodným výběrem n-rozměrných vektorů 


• první krok - určení aposteriorního rozdělení 






Tento výběr se v anglickě literatuře o geovědách často 
označuje slovem "ensemble”, jednotlivé členy jsou pak 
chápany jako všechny možné scénáře vývoje stavu at¬ 
mosféry. 

Ve statistice se za náhodný výběr považuje ta¬ 
ková množina náhodných veličin (vektorů), pro kte¬ 
rou platí, že všechny její členy jsou stejně rozděleny 
a navzájem nezávislě. Jak ukážeme později, ěleny 
ansámblu nezávislě nejsou, nicměně v praxi (zejména 
v geovědách) se ěasto pojmy ansámbl a náhodný výběr 
zaměňují. 

Počet členů ansámblu je N, přičemž platí N << n. 
Každý člen obsahuje všechny veličiny popisující stav 
systému. Takto spočetnou výběrovou kovarianění ma¬ 
tici pak dosadíme do vzorců (5) a (6). Tento ansámbl 
získáme perturbací vstupních dat 

Dtj -l- V(.i,..., T)ti + V(.jv, 

kde 'Vtij jsou náhodně generovaně data, navzájem 
nezávislá, z normálního rozdělení 

~iV(0,RtJ Vi=l,...,X. 

Připomeňme, že jsme předpokládali následovně 
podmíněně rozdělení vstupních dat 

Pro další výpočet si musíme definovat apriorní odhad 
střední hodnoty X{. a výběrovou kovarianční matici C{. 


• druhý krok - předpověď 




Uvědomme si, že zatím co v klasickěm KF jsou apri¬ 
orní střední hodnota a kovarianční matice Qii+i 
určeny deterministicky, jejich analogie v EnKF, X{. a 
C{., jsou náhodně veličiny. 


Největším přínosem EnKF je skutečnost, že v průběhu 
celěho výpočtu není třeba mít uloženou v paměti kova¬ 
rianční matici. Nechť u je libovolný vektor délky n, pak 
je možné zjednodušit výpočet 




1 ^ 


kde jsme označili 

Fta = Kj - • 

Uvědomme si, že počítaní výrazu 

{xi^-x{yu 

je vlastně poěítání hodnoty skalárního součinu dvou 
vektorů dělky n. To znamená, že Cf.u můžeme zjed¬ 
nodušit na součet N skalárních součinů dvou vektorů 
dělky n. Podobně lze zjednodušit 




/WT 
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kde pro rozměry výrazů v sumě platí 

(lítÁKj - K)) - K)y ■ 

^ ^^^ 
mxl 1 Xm 

Opět jsme tedy výpočet s extrěmně velkými maticemi 
převedli na výpočet skalárního součinu. 

EnKF byl poprvě publikován v roce 1994 v článku [4]. 
Od té doby se stal velmi populární a je používán ve 
velkém množství případů, kdy je třeba asimilovat data 
velké dimenze, jako například při předpovědi počasí, 
předpovědi šíření lesních požárů nebo zpracování ob¬ 
razu. Podrobně shrnutí těto metody spolu s řešením její 
implementace a dalším rozšířením je možné najít v [5]. 
Rovnéž výborná, ale extrémně obsáhlá je i kniha [7]. 
Velkě množství dalších článků, prezentací a zdrojověho 
kódu vztahujících se k této metodě je možné najít na 
stránkách http://enkf.nersc.no/, které byly a dosud jsou 
vytvářeny především objevitelem této metody. 

Přestože je tato metoda známá a používaná už téměř 
dvacet let, dlouhou dobu chyběly teoretickě studie, zda¬ 
li a za jakých podmínek EnKF konverguje ke KF. 
Ve většině zdrojů bylo možně najít jen argumentaci 
založenou na nezávislosti 

Vtii) • • • I Vtijv, 

a z toho plynoucí nezávislosti členů ansámblu 

Xt-i,..., Xt-jv. 

Tyto členy jsou však svázány počátečními podmínkami, 
proto předpoklad jejich nezávislosti není správný. Te¬ 
prve nedávno se objevily dvě studie zkoumající konver¬ 
genci EnKF, a to [11] a [13]. Druhý článek používá upra¬ 
vený slabý zákon velkých čísel, přičemž požadavek na 
nezávislost členů ansámblu nahrazuje jejich invariancí 
vůči permutacím, pomocí kterčho je dokázaná konver¬ 
gence výběrově kovarianční matice ke skutečně kova- 
rianění matici. Rovněž je v něm dokázána konver¬ 
gence a rychlost těto konvergence. V obou případech je 
uvažováno fixní n a konvergence je myšlena pro N —> 
oo. 

5. Další možnosti rozšíření Kalmanova filtru 

Z konstrukce výběrová kovarianční matice C{. vyplývá, 
že její hodnost je maximálně — 1 a z toho důvodu je 
perturbace ansámblu 

va v/ va -V"/ 

^til ■ ■ ■ ’ ^UN -^UN 

omezena do prostoru sloupců matice C{. Když si 
uvědomíme, že N << n, jeví se toto jako závažný ne¬ 
dostatek EnKF. Bylo dokonce ukázáno, v článku [1], že 


tato skutečnost může, za jistých podmínek způsobit di¬ 
vergenci EnKF. Jedním z řešení je využít metodu lokali¬ 
zace a podrobně se ji věnují například články [1] a [14]. 

Další z možných otázek může být, zda-li perturbace dat 
nepřináší až příliš velkě zašumění. Tomuto problěmu 
se podrobně věnuje [6], kde je navržena varianta EnKF 
bez perturbace dat, přičemž algoritmus výpočtu EnKF 
se pak rozšíří o jeden krok: 

• přidaný krok 

• první krok - určení aposteriorního rozdělení 

X“^. = X“^- + (x(^. - x{jĚt„ 

• druhý krok - předpověď 

= Mt,X“ +bt,, 

kde matice Ej. je definovaná jako řešení rovnice 
C“ =(I-Et,HtJc{. 

Dalším možným řešením je snažit se o efektivnější ge¬ 
neraci zašumění, než je obsažena v klasickěm EnKF po- 
psaněm v kapitole 4. Obrovskou skupinou patřící do této 
kategorie jsou takzvané ”square-root”filtre. Čtenářům 
zajímajícím se o tyto metody doporučujeme článek [15]. 

Kromě zde vyjmenovaných existují samozřejmě další 
druhy filtrů odvozených z EnKF, využívajících se pro 
asimilaci dat velká dimenze. Rozsáhlá práce pokrývající 
velká množství dalších filtrů jsou [2] a [10]. 

6. KF a EnKF na Hilbertově prostoru 

Všechny asymptotická výsledky zmiňovaná v předchozí 
kapitole platí v případě, že m a n jsou fixovaná, resp. 
omezená. Přirozenou otázkou zůstává, zdali se tyto 
vlastnosti zachovají i když budeme zvyšovat man (tj. 
zvyšovat počet měření a zjemňovat model). Ve statistice 
se často předpokládá, že chyby měření na dvou různých 
místech jsou navzájem nezávislá. V případě neustálěho 
zjemňování mřížky se však splnění tohoto předpokladu 
nedá příliš očekávat. Je proto otázkou, jestli to nemůže 
pak nepříznivě ovlivnit chování filtru. Tento problěm je 
znám jako tzv. ”kletba dimenze”. 

Z uvedených důvodů by bylo optimální mít dokázanou 
konvergenci EnKF bez ohledu na dimenzi prostoru. 
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Jinými slovy by bylo nutné dokázat tuto konvergenci 
na obecném Hilbertově prostoru (úplný vektorový pro¬ 
stor s definovaným skalárním součinem). Tato úloha 
však není úplně jednoduchá, například dosud nebylo, 
dle autorových nejlepších znalostí, nikdy publikováno 
ani rozšíření klasického KF na obecný Hilbertův pro¬ 
stor, což je samozřejmě nutná podmínka k práci s EnKF 
na takovémto prostoru. 

Jako první krok si uveďme definici náhodné veličiny. 
Nechť W je obecný (nekonečně rozměrný) Hilbertův 
prostor. Skalární součin na něm definován označíme 
standardně 

(•,•)• 

Náhodnou veličinu pak na tomto prostoru W definujeme 
jako měřitelné zobrazení 

kde (fž, 5, P) je standardně definován pravděpodobnostní 
prostor a B{W) je cr-algebra všech Borelovských 
množin na W. Střední hodnotu takovéto náhodné 
veličiny 

E [X] 6 W 

definujeme jako řešení rovnice 

(w, E [X] w) = E [(w, Xn)] \/u,veW. 

Kovarianční operátor Cov{X, Y) se pak definuje jako 
řešení rovnice 

{u, Cov{X, Y) [X] v) = 

= E [{u, X - EX) {v, Y - EK)] 

Vm, v e W. 

Střední hodnota je jednoznačně definována pro všechny 
náhodné veličiny 

X eL^ in,W). 

Podobně je kovarianční operátor jednoznačně definován 
pro libovolné dvě náhodné veličiny 

X,Y eL^ (H, W). 

Tato tvrzení vycházejí z Rieszovy věty o reprezentaci. 
Podrobně se takovýmto náhodným veličinám věnuje 
například kniha [3]. 

V ideálním případě bychom nyní jenom dosadili tyto 
charakteristiky do rovností (5) a ( 6 ) a rozšířili tak KF 
na prostor W. Bohužel to není tak jednoduché a tenhle 
pokus otevírá několik otázek, jako například 

• Jak definovat hustotu na nekonečně rozměrném 
prostoru? 


• Platí Bayesova věta na takovémto prostoru? 

• Při počítaní KF jev kroku (4) nutné invertovat ma¬ 
tici (Ht,Q(HT + R, J. Na prostoru W by bylo 
nutné invertovat analogicky definován operátor. 
Existuje vůbec tento inverzní operátor? Jestli ano, 
je omezen? 

• Jak definovat náhodnou perturbaci na prostoru W 

Odpovědi na tyto a další otázky autorovi zatím nejsou 
známy a bude na nich pracovat v rámci svého dok¬ 
torského studia. 
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Abstract 

In this páper we appty the concepts of agent, 
role and group to the field of hybrid intelligence. 

The model is formalized in axioms of descrip- 
tion logic. The open-world assumption axioms 
allow to define necessary relations between con¬ 
cepts and individuals in the systém. The axi¬ 
oms interpreted in dosed world express integrity 
constraints. The model is implemented in sepa- 
rate ontology agent which fulfils the functions 
of matchmaking and correctness verification in 
computational MAS. Apart from simple compu¬ 
tational MAS scenario, we specify the role based 
model of other hybrid intelligence techniques, 
such as external and evolutionary learning, and 
preprocessing. 

1. Introduction 

An agent is a Computer systém situated in some envi- 
ronment that is capable of autonomous action in this en- 
vironment in order to meet its design objectives [1]. Its 
important features are adaptivity to changes in the envi- 
ronment and collaboration with other agents. Interacting 
agents join in more complex societies, multi-agent Sys¬ 
tems (MAS). These groups of agents gain several advan- 
tages such as the applications in distributed systems, de- 
legacy of subproblems on other agents, and flexibility of 
the software systém engineering. 

Many present-day applications require dynamic and 
open societies. The importance of interaction and coope- 
ration aspects of agents, therefore, increases. The effort 
to reuse MAS patterns brings the need of separation of 
the interaction logic from the inner algorithmic logic of 
an agent. There are several approaches providing such 
separation and modeling a MAS from the organizatio- 
nal perspective, such as the tuple-spaces, group compu- 
tation, activity theory or roles [2]. 


The definitions of concept role vary in different fra- 
meworks [2]. Generally speaking, a role is an abstract 
representation of stereotypical behavior common to di¬ 
fferent classes of agents. Moreover, it serves as an in¬ 
terface, through which agents perceive their execution 
environment and affect this environment. Such a repre¬ 
sentation contains a set of actions, capabilities, which an 
associated agent may utilize to achieve its goals. On the 
other hand, the role defines constraints, which a reques- 
ting agent has to satisfy to obtain the role, as well as re- 
sponsibilities for which the agent playing this role holds 
accountable. The role also serves as a mean of definition 
of protocols, common interactions between agents. An 
agent may handle more roles, and a role can be embo- 
died by different classes of agents. Moreover, agents can 
change their roles dynamically. 

The role-based Solutions may be independent of a par- 
ticular situation in a systém. This allows designing an 
overall organization of multi-agent systems, represen- 
ted by roles and their interactions, separately from the 
algorithmic issues of agents, and to reuse the Solutions 
from different application contexts. The coordination of 
agents is based on local conditions, námely the positi- 
ons of an agent playing the role, thus even a large MAS 
can be built out of simple organizational structures in a 
modular way. 

The computational multi-agent systems, i.e. application 
of agent technologies in the field of hybrid intelligence, 
showed to be promising by its configuration flexibility 
and capability of parallel computation. In order to auto- 
matize the composition of computational MAS, its for- 
mal model in description logic (DL) was introduced [3]. 
We are employing the concepts of role and group and 
transform the role model in axioms of DL [4]. In this pá¬ 
per, the necessity of axiom definition both under open- 
and closed-world assumption is highlighted and the mo¬ 
del is extended by integrity constraints. This formal de- 
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Figuře 1: Two examples of computational MAS — the simplest one (left), and the more complicated one (right) containing a 
neural network trained by an evolutionary algorithm. 


scription allows dynamic finding of suitable agents and 
groups (matchmaking), verification of correctness of 
MAS (systém checking) or automated creation of MAS 
according to the task. 

In the next section, we present a computational intelli¬ 
gence scenario and elaboráte the role-based model of a 
computational MAS. In Section 3, the model is forma- 
lized by means of description logic axioms. The data- 
mining processes often require data pre-processing. The 
role of a pre-processing agent is defined and included in 
the model in Section 4. In Section 5, the implementation 
of ontology agent managing the dynamic role-based of 
MAS is described. Section 6 concludes the páper and 
show future work. 


2. Role Model of Computational MAS Scenario 

Hybrid models including combinations of artificial in¬ 
telligence methods, such as neural networks, genetic al- 
gorithms, and fuzzy logic controllers, can be seen as 
complex Systems with a large number of components 
and computational methods, and with potentially unpre- 
dictable interactions between these parts. These approa- 
ches háve demonstrated better performance over indi- 
vidual methods in many real-world tasks [5]. The di- 
sadvantages are their bigger complexity and the need 
to manually set them up and tuně various parameters. 
Also, there are not many software packages that provide 
a large collection of individual computational methods, 
as well as the possibility to connect them into hybrid 
schemes in various ways. Multi-agent systems seem to 
be a suitable solution to manage the complexity and dy- 
namics of hybrid systems. In our approach, a computati¬ 
onal MAS contains one or more computational agents, 
i.e. highly encapsulated objects embodying a particu- 
lar computational intelligence method and collaborating 
with other autonomous agents to fulfill its goals. Several 
models of development of hybrid intelligent systems by 
means of MAS háve been proposed, e.g. [6] and [7]. 


In order to illustrate the abilities of role-based models 
we will present an example of analysis of a computatio¬ 
nal MAS scenario. We are exploiting the conceptual fra- 
mework of the AGR model [8]. Its organization-centered 
perspective allowing modular and variable construction 
of MAS is well suited especially to more complicated 
configurations of computational agents. On the other 
hand, GAIA establishes the static assignment between 
roles and agent-classes. We are leaving this dynamical 
aspect to the development of algorithms controlling in¬ 
dividual instances of agents. These algorithms employ 
the concepts of groups and roles, and are allowed to 
change roles and enter groups during the run-time. 

For two examples of computational MAS see Figuře 1. 
These descriptions correspond to physical implemen¬ 
tation of agents employing the JADE agent platform and 
Weka data mining library [3]. The systém in our sce¬ 
nario consists of a Task Manager agent. Data Source 
agent, two computational agents (RBF neural network 
and Evolutionary algorithm agent) and supplementary 
agents. In the čase of RBF network, there are unsupervi- 
sed (vector quantization) and supervised (gradient, mat- 
rix inverse) learning agents. The evolutionary algorithm 
agent needs Fitness, Chromosome, Shaper and Tuner 
agents. 

Such a computational MAS is represented by a role or- 
ganizational structure shown at Figuře 2. It consists of 
possible groups, their structures, described by means of 
admissible roles and interactions between them. This 
organizational structure contains the following group 
structures: 

• Computational Group Structure. It contains three 
roles: a Task Manager, Computational Agent im- 
plementing a computational method and Data 
Source which provides it with training and testing 
data. 

• Simple Learning Group Structure consisting of 
two roles: a Teacher and Learned Computatio- 
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nal Agent. This structure is instantiated by three 
groups for each Teacher (Vector Quantization, 
Gradient and Matrix Inverse). 

• Evolutionary Algorithm Group Structure contains 
an Evolutionary Algorithm Agent, Evolved Com- 
putational Agent, Chromosome which translates 
representation of an individual into the model pa- 
rameters. Tuner with probabilities of the algori¬ 
thm and Shaper scaling the individual fitness. 


Computational 
Group Structure 


i 


Task 

Manager 


> 


< CQmp. ^ Data \ 

Agent ^ ^ Source 


Simple Learning 
Group Structure 


< Learned N. 
Comp. Ag.^/^ 


< 


i 

Teacher 


> 


Evolutionary Alg. 
Group Structure 


<? 

< 


Chromosome j 


Shaper 


Evolved 
»^Comp. Ag.^ 


Evolution. 
^Algorithm ^ 


> 
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Figuře 2: The organizational structure diagram of the com¬ 
putational MAS 



Figuře 3: The organization of a concrete computational MAS 
scenario (cheeseboard notation) 


Every concrete organization of the MAS is built with re- 
spect to the rules of the organizational structure. Aims 
of the agents are fulfilled by assuming of roles or estab- 
lishing of groups and interactions. The agents can play 
different roles in different groups and even a complica- 
ted MAS can be built from these structures. 

We will show a typical run of such a computational 
MAS with data-mining task. At the beginning of the 
run, only the computational group exists with the RBF 
network in the role of a computational agent. After the 
request for learning the problém by the task manager, 
appropriate simple learning groups are created and the 
learning agents are constructed, reused or found. Simi- 
larly, the evolutionary algorithm group is constructed 
with all supplementary agents. The interactions proceed 
according to the definition of organizational structure. 
Figuře 3 shows a statě of the concrete organization of 
such computational MAS. 

We can see that the role model allows simplifying the 
construction of more complicated computational multi- 
agent systems by its decomposition to the simple group 
structures and roles, to which the agents assigns. Moreo- 
ver, the position of an agent in a MAS in every moment 
of the run-time is defined by its roles without need to 
také account of its internal architecture or concrete me- 
thods it implements. It also reduces a space of possible 
responding agent when interactions are established. 


3. Descřiption Logic Model of Computational MAS 

The family of Descřiption Logic (DL), fragment of íirst- 
order logic, is nowadays de facto standard for ontology 
descřiption language for formal reasoning [9]. In DL, a 
knowledge base is divided into a T-Box (terminologi- 
cal box), which contains expressions describing concept 
hierarchies, and an A-Box (assertional box) containing 
ground sentences. 

Web Ontology Language (OWL), an expressive 
knowledge representation language, is based on de- 
scription logic [10]. Semantics of OWL is designed 
for scenarios where the complete information cannot 
be assumed, thus it adopts the Open World Assumption 
(OWA). According to the OWA, a statement cannot be 
inferred to be falše only on the basis of a failure to 
prove it. If there is assumed complete knowledge, the 
T-Box axioms cannot be ušed as Integrity Constraints 
(ICs) which would test validity of the knowledge base. 
In order to check integrity constraints, the Closed World 
Assumption (CWA) is necessary. There are several ap- 
proaches simulating the CWA by different formalisms, 
e.g. rules or queries [10]. 
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We continue in the effort to describe the computatio- 
nal MAS in the description logic model [3]. Our model 
would incorporate the concepts of group and role. In pá¬ 
per [4], we háve elaborated basic role-based model of 
computational MAS in description logic under OWA. 

We want to preserve the simplicity of the OWL models 
and also to express ICs in the same language. In [10] 
the authors presented an IC validation solution redu- 
cing the IC validation problém to SPARQL query [11] 
answering. Moreover, they introduced a prototype IC 
validator extending Pellet [12], the OWL reasoner. For 
example, the constraint that every product has a manu- 
facturer: 

Product C BisManufacturedBy.Manufacturer 

would not be violated if there is defined a product wi- 
thout manufacturer in an A-Box [10]. The SPARQL re- 
presentation of this IC would be the following query: 

ASK WHERE { 

?x rdf:tYpe Product. 

OPTIONAL { 

?x isManufacturedBy ?y. 

?y rdf:type Manufacturer. 

} 

FILTER(!BOUND{?y)) 

} 


Responder. The functional role isInitiatorOf 
relates to the agent which the action uses. The role 
sendsTo contains the agents to which the action 
is connected. 

• Requestinit is a subclass of the previous con- 
cept which defines only those initiators that send 
messages to one agent (unlike e.g. the contract net 
protocol). This concept adds the following IC: 

Requestinit sendsTo.l 

• Agent is a superclass of all roles. The role as- 
signment is achieved simply by a concept asser- 
tion of the agent individual. The inverse functional 
roles haslnitiator (inverse of isInitiatorOf) 
and hasResponder couple an agent with par- 
ticular actions and responsibilities. While the 
hasResponder relation is a fixed property, the 
haslnitiator occurs only when a corresponding 
connection is established. Finally, the functio¬ 
nal role isMemberOf indicates belonging to a 
group. 

• Group represents a group in a MAS. It has only 
one role, an inverse of the memberOf role, called 
hasAgent. 


Thus we divided the T-Box of the proposed model into 
two parts. The hrst part contains axioms describing ma- 
inly the concept hierarchy and the necessary relations 
between their instances. This schéma is interpreted in 
the OWA and defines the facts the reasoner will infer 
from the given A-Box. In the second part, there are con- 
straints which define the integrity conditions of the sys¬ 
tém related mainly to the capabilities of agents. These 
are interpreted on the CWA. ’ The time-dependent infor- 
mation, the current statě of the systém is in an A-Box of 
the ontology. 

As we háve already mentioned, a role is defined as a set 
of capabilities, i.e. actions (interactions) an agent assu- 
ming this role can use, and a set of responsibilities or 
events the agent should handle. A group is then descri- 
bed by a set of the roles the group contains. A hierar¬ 
chy of concepts should respect this. The designed T-Box 
contains the following superior concepts: 

• Responder is a responsibility of a role. It stands 
for a message type the agent handles. 

• Initiator represents an action from a capabi- 
lity set and it is dosely related to a particular 


The computational group structure contains three agents 
with assigned roles of a task manager, computational 
agent and data source. Among these roles two connecti- 
ons can be established. First, the task manager sends 
control messages to the computational agent in order to 
solve a problém. It contains necessary parameters (data 
tile name, learning options) and an action the compu¬ 
tational agent should perform, such as training and tes- 
ting. The second connection is between the computati¬ 
onal agent and data source, which provides data from a 
specified filé. 

The sending of control messages between the task ma¬ 
nager {TaskManager), which initiates this connection, 
and the computational agent (CompAgent) is modeled 
by two concepts, an initiator {Control Msginit) and 
a responder {ControlMsgResp). The initiator of this 
connection is an instance of ControlMsginit which is 
a subclass of the Requestinit class. It sends messages 
only to an agent with a running responder handling these 
messages, and it is coupled with a Task Manager role as 
a capability. The schéma tile of the ontology contains 
axioms of the initiator and responder concept hierarchy, 
and a definition of the responder individual: 


C ontrolM sginit Cq Requestinit 

^Axioms of T-Box are distinguished in the next text by a subscript of the inclusion axiom symbol. A standard schéma axiom interpreted in the 
OWA is in the form C Co • An integrity constraint in the CWA has the form C Co ^ 2 - 
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The following integrity constraints for this concept 
check the roles of initiating and responding agents: 

ControlMsginit \/sendsTo. 

3hasResponder.ControlM sgResp 
n ^isInitiatorOf.TaskManager 

The control message responder is a simple descendant 
of the Responder concept and this class contains the 
instance ControlMsg. The schéma axioms follow: 

C ontrolM sgResp Co Responder 
ControlM sgResp{ControlM sg) 

The data connection between the computational agent 
and the source of data (DataSouree) is again divided 
in two classes: initiator DataMsginit and responder 
DataMsgResp. The following axioms for these con- 
cepts are similar to those for the control connection: 

DataMsginit Cq Requestinit 
DataMsginit Co ^sendsTo. 

.3hasResponder.DataM sgResp 
n \/isInitiatorO f.CompAgent 
DataAdsgResp Cq Responder 
DataM sgResp{DataM sg) 

Role definitions are descendants of the Agent concept 
and háve to contain their responsibilities, i.e. responders 
(capabilities are defined on the initiator side). The re- 
sponsibility of the computational agent (CompAgenť) 
is to respond on the control connections. These are axi¬ 
oms inserted in the schéma set: 

CompAgent Cq Agent 

n 3 hasResponder .C ontrolM sg 

The data source (DataSouree) handles requests for 
data and the task manager (TaskManager) role only 
sends messages in a group: 

DataSouree Cq Agent 

n 3 hasResponder.DataM sg 
Task Manager Cq Agent 

Finally, the computational group (CompGroup) conta¬ 
ins only the agents which háve asserted that they háve 
the computational agent, task manager or data source 
role. The subclass-axiom is important for open world re- 
asoning: 

CompGroup Cq Group 


On the other hand, entrance of the agent with a wrong 
role has to be checked by the following dosed world 
constraint: 

GompGroup Cc 'dhasAgent. 

.(CompAgent U TaskManager U DataSouree) 





Figuře 4: Hierarchy of main ontology concepts in the com¬ 
putational MAS model 

The simple learning group structure is defined in a si¬ 
milar way by the following schéma and integrity rules: 


PhD Conference ’ 11 


51 


ICS Prague 





















Ondřej Kazík 


Role Model of Hybrid Intelligence ... 


LearningMsgResp Co Responder 
LearningM sgResp{LearningM sg) 
LearningMsginit Co Requestinit 
LearningMsginit Co 'isendsTo. 

3hasResponder.LearningM sgResp 
n \/isInitiatorO f .LearnedC A 
LearnedCA Co Agent 
Teacher Co Agent 

n 3 hasResponder.LearningMsg 
SimpleLearningGroup Co Group 
SimpleLearningGroup Co ^hasAgent. 

.{Teacher U LearnedC A) 

Evolutionary algorithm group contains an evolutionary 
algorithm, evolved computational algotihm, tuner with 
parameters of the algorithm, and chromosome, i.e. re- 
presentation of individuals: 


EAControlMsginit Co Requestinit 
EAControlMsginit Co \/sendsTo. 

.3hasResponder .E AC ontrolM sgResp 
n '^isInitiatorO f.EvolvedC A 
EAParamsMsginit Co Requestinit 
EAParamsMsginit Co 'isendsTo. 

.BhasResponder.EAParamsM sgResp 
n \/isInitiatorO f .EvoAlgorithm 
EitnessMsgInit Co Requestinit 
EitnessMsgInit Co 'i.sendsTo. 

.3hasResponder .FitnessM sgResp 
n \/isInitiatorO f .EvoAlgorithm 
ComputeModelMsgInit Co Requestinit 
ComputeModelMsgInit Co 'isendsTo. 

.3hasResponder.ComputeM sgResp 
n MisInitiatorOf .Chromosome 
EAControlM sgResp Co Responder 
E AC ontrolM sgResp{EAC ontrolM sg) 
EAParamsM sgResp Co Responder 
EAParamsM sgResp{E AParamsM sg) 


FitnessM sgResp Co Responder 
FitnessM sgResp{FitnessM sg) 
ComputeModelMsgResp Co Responder 
C omputeM odelM sgResp{C omputeM odelM sg) 
EvolvedCA Co Agent 

n 3 has Responder.ComputeM odelM sg 
EvoAlgorithm Co Agent 

n 3 hasResponder .E AC ontrolM sg 
Chromosome Co Agent 

n 3 hasResponder .FitnessM sg 
Tuner Co Agent 

n 3 hasResponder.E AParamsM sg 
EvoAlgorithmCroup Co Group 
EvoAlgorithmCroup Co \/hasAgent. 

.{EvolvedCA U EvoAlgorithm U 
UChromosome U Tuner) 

The main concepts in the ontology described above are 
shown in Figuře 4. 

4. Role Model of Preprocessing 

The reál data sets are often imperfect and noisy, contains 
outliers, missing values or impossible data combinati- 
ons or there is redundant and irrelevant Information. The 
computational modeling techniques also impose requi- 
rements on the data set. Therefore, separate phase of pre¬ 
processing before main computation is necessary [13]. 
There is variety of pre-processing techniques with di- 
fferent effects on data, e.g. feature extraction, missing 
values and outlier filtering, or resampling etc. 

In order to keep flexibility of computational MAS so- 
lution we will implement the pre-processing method as 
a separate agent. This pre-processing agent obtains data 
from a data source and provides pre-processed data to 
other agents. The options of the pre-processing method 
and source-flle háve to be set by a task manager who 
Controls the computation. 

Therefore the interactions deflned in Subsection 3 can 
be utilized. The pre-processing agent gains properties 
of both the data source (it provides data) and com¬ 
putational agent (it receives data from another source 
and waits for control messages). Thus the role of 
Preprocessing Agent is deflned as an intersection of 
DataSource and CompAgent: 

Preprocessing Agent Cq CompAgent □ DataSouree 
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The pre-processing agent with this role is also able to 
enter any computational group according to this defini- 
tion. It also includes the possibility of creation of agents 
Chain, where on the one end is an agent providing origi¬ 
nál data table and on the other is a data mining compu¬ 
tational method. Diagram of such a configuration with 
two preprocessing agent is at Figuře 5. 



Computational 

Group 


Figuře 5: Example of computational MAS configuration with 
two preprocessing agents. 

On a request of the task manager the pre-processing 
agent obtains data from specified data source and does 
the pre-processing. On the next request for the data from 
another computational agent it sends the pre-processed 
data. 

The task manager has to prepare the sequence of compu- 
tations of the whole data mining process. Its responsihi- 
lity is to run the pre-processing agents in the right order 
hefore it will send a request for the computation of the 
data mining computational method. 


iting the concept hierarchy, it is possible to search 
groups of certain types or agents that háve a cer- 
tain role, that are members of certain group or that 
can handle certain types of messages. 



Figuře 6: Architecture of the ontology agent. 

The ontology agent (shown in Figuře 6) consists of the 
request handling module which is responsible for Pro¬ 
cessing of incoming requests and replying. It employs 
the ontology functions provided by the Pellet OWL-DL 
reasoner [12] and its extensions. The ontology model 
contains an assertional hox of the ontology and descri- 
bes the current statě of the systém. The open-world rea¬ 
soner infers new facts from axioms in the OWL schéma 
tile and content of the A-Box. The integrity constra- 
ints saved in a separate OWL filé are converted into 
SPARQL queries and run hy the SPARQL engine on the 
ontology model. The SPARQL engine is also ušed to an- 
swer matchmaking queries. 


5. Implementation 

To coordinate the run-time role organization of MAS a 
huilt according to the schemas and constraints of T-Box, 
it is necessary to háve a centrál autority, separate agent 
in which the DL model is represented. Other agents will 
change the statě of the model and query it by interaction 
with this agent. 

The model is implemented as an ontology agent (OA) in 
JADE, Java-hased framework for a MAS [14]. The goals 
of the O A are: 

• Keeping track of the current statě of MAS. Agents 
present in the MAS register themselves in the OA, 
State changes of their roles, create and destroy 
groups and their membership in them, and estab- 
lish communication channels. 

• Verification of correctness of MAS. The OA Con¬ 
trols all changes of the systém and does not allow 
activities which would violate the integrity con¬ 
straints. 

• Matchmaking of agents and groups. When explo- 


The communication ontology for contents of OA 
messages has heen created. This ontology consists of 
three types of concepts. 

The hrst group contains actions changing the statě of the 
ontology. These actions result in changing of assertions 
in the A-Box of the model and are validated hy the inte¬ 
grity constraints. If any of the ICs is violated, the change 
is not performed. In this čase the action ends by failure. 
For example, setting of a role of computational agent to 
the agent rbf is achieved by adding the following asser- 
tion to the A-Box: 

CompAgent{rbf) 

Removing the role is equivalent to removing this asser- 
tion. Similarly the creation of data initiator init of agent 
rbf and its connection to an agent ds results in: 

DataM sglnit{iniť) 
has Initiator (rbf, init) 
sendsTo{init, ds) 

Notě that if the CompAgent role is not among the ro¬ 
les of agent which owns that initiator, or the receiving 
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agent does not háve a DataMsgResp responder, an in¬ 
tegrity constraint is violated and the statě of the A-Box 
is undone. 

In the second group there are concepts specifying 
matchmaking queries on groups or agents. The descrip- 
tion of the requested agent contains its role, group in 
which it should be or its responders. These queries are 
transformed into SPARQL queries [10] and executed on 
the inferred model. 

The third group of concepts contains concepts informing 
about results of actions or queries. 

6. Conclusion 

With rising complexity and dynamism of heterogeneous 
multi-agent systems, the importance shifts from inner 
structure and algorithmic logic of individual agents to 
the cooperation and interaction aspects of the systém. 
The concept of role and role-based models simpliíies the 
development of such multi-agent systems. 

Hybrid intelligence systém, cooperation of various me- 
thods of artificial intelligence, has been successfully im- 
plemented as a computational multi-agent systém. It 
contains one or more computational agents, i.e. auto- 
nomous encapsulations of individual computational me- 
thods, and enables flexibility in run and development of 
such a MAS. 

In this article, we háve elaborated the role-based mo¬ 
del of computational MAS realizing hybrid intelligence. 
The model is formalized of description logic. Both the 
deduction axioms and integrity constraints are deflned 
in the same formalism of OWL-DL with distinction of 
open-world and closed-world assumptions. 

In order to support the real-worid data-mining pro- 
cesses, the models of computational group, external 
and evolutionary learning group, and pre-processing 
agent háve been included. The proposed model of pre- 
processing allows defining chains of pre-processing 
agents gradually solving the input data inconsistencies. 

The ontology agent representing the model of current 
MAS State has been implemented. The ontology agent 
allows generál management, correctness verification and 
matchmaking of the MAS with concepts of agents, roles 
and groups. For this purpose, reasoning and querying of 
the DL model is employed. 

Further research will be put in ontology classification of 
computational methods, their parameters and input data. 
This model will broaden the possibilities of the model 
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to express the computational MAS dynamics. The role- 
based model is currently being included in the com¬ 
putational multi-agent systém Pikater [15], where ari- 
ses the problém of choice of best computational method 
with respect to the unknown input data characteristics, 
i.e. meta learning. 
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Abstract 

Formal modeling methods are becoming 
an important part of today’s software develo- 
pment process. The Alloy modeling language, 
which is one of the emerging modeling appro- 
aches, is gaining popularity due to its powerful 
and attractive syntax based on first-order logic 
and relational calculus. One of the key features 
of the Alloy solver - Alloy Analyzer - is the abi- 
lity to flnd a random instance of a given Alloy 
model. One outcome of this feature is simpli- 
fication of the Alloy model development pro¬ 
cess as the found instances may likely reveal the 
flaws of the model and thus may serve for de¬ 
bugging. Compared to traditional inspection of 
an error trace in context of model checking tech- 
niques, this hrings a serious development spe- 
edup. However, this technique turns out to be 
ineffective for large-scale models of complex 
Systems, as the found instances are getting too 
complex and the search for the instances is too 
time-demanding. In this páper, we describe the 
particular difficulties which may he encounte- 
red during modeling of complex Systems, give 
a real-life čase study, and propose an overall ap- 
proach to address these difficulties. 

1. Introduction 

While modeling large-scale complex systems with mul- 
tiple concerns, it is often difficult to develop, analýze, 
and debug such formal models since they are too big to 
be verified/checked on a regular basis during the deve¬ 
lopment. Moreover, the results given by model verifier 
(error traces, etc.) are generally hard to interpret which 
further impedes the model development and analysis. To 
facilitate the model analysis, the Alloy framework [3,4] 
offers possibility of finding and inspecting a random in¬ 
stance of the model in development. The reason is that 


inspecting such instance may likely reveal some of the 
flaws of the model. In the rest of the páper, we will re- 
fer to such model analysis method based on finding and 
inspecting model instances as to example-driven model 
analysis. As an aside, in context of Alloy the search 
for model instances is limited by their maximal size (in 
terms of number of employed objects). Alloy justifies its 
approach by proposing the smáli scope hypothesis [4] 
according to which counterexamples invalidating a mo¬ 
del tend to occur in smáli models instances already, 
which in turn are easier to comprehend. Compared to 
other methods form model analysis such as inspection 
of error traces, this example-driven model analysis is far 
more comprehensive. Since this method provides fast fe- 
edback during model development, it is suitable for ra¬ 
pid prototyping of formal models. 

However, as a model is getting larger, it is more difficult 
to identify the flaws by analyzing its instances. This is 
basically caused by three restraining factors: 

(a) The instance generation is random (depending to 
the implementation of the underlying SAT solver, 
which often employs random steps to solve the gi¬ 
ven formula) and the model developer has to first 
recognize the structure of the new model instance 
before he can actually analýze the instance. This 
is more difficult as the instance is getting bigger. 
Moreover, a model developer typically needs to 
analýze variations of a certain part of the model, 
which he is currently working on, only. However, 
when the developer updates the specification, the 
newly generated instance can be completely dif- 
ferent to the one generated from the originál spe¬ 
cification. 

(b) As the Alloy Analyzer tool supports incremen- 
tal execution of the underlying SAT solver, it 
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allows traversing a sequence of generated instan- 
ces. However, the 'interesting’ instances may be 
scattered in this sequence (the sequence is likely 
going to be different for different executions of 
the tool). Therefore, a tedious traversal of the 
sequence is required each time a certain scenario 
is needed for analysis (for example after a change 
in the model in order to verify that the change 
corrected a flaw in the scenario). The sequences of 
the generated instances may contain large amount 
of ’uninteresting’ instances (due to variability in 
the 'uninteresting’ parts of the specification). This 
together with the fact (a) makes the traversal very 
difficult. 

(c) The performance demands grow exponentially 
with the size of the specification (because the spe¬ 
cification is transformed into a SAT formula). In 
scope of the rapid development of Alloy models 
this represents a serious drawback. 

All these limitations restrain the usage of example- 
driven analysis in context of large and complex models 
in Alloy. 

The Alloy Analyzer provides basically two ways of 
fighting the restraining factors. The former is the 
already-mentioned configurable visualization of instan¬ 
ces. However, this visualization has severe limitations, 
such as restricted layout modifications, constrained vi¬ 
sualization of individual instance elements, etc. The lat- 
ter is to provide the model finding process with suitable 
Alloy specification which constrains the set of model in¬ 
stances to the currently Tnteresting’ ones. However, in 
non-trivial models the adequate constraint specification 
is too extensive and too complex to be written manually. 

Nevertheless, the method of systematically constraining 
the model appears to be a promising approach for fi¬ 
ghting the mentioned drawbacks of example-driven ana¬ 
lysis. In this sense, a challenge is to find an automatic 
or semi-automatic way of assessing the constraints and 
converting them to a corresponding Alloy specification. 

1.1. Goals and Structure of the Paper 

In this páper, we propose a modeling method for Alloy 
(i.e., multilevel modeling) which enables example- 
driven model analysis also for large-scale and com¬ 
plex models. The approach is based on partitioning of 
a model into separate partitions by exploiting the inhe- 
rent internal structure of the model, and separate con¬ 
strained analysis of the individual partitions by semi- 
automatically making the rest of the model immutable. 
The goal is to allow easy and comprehensive analysis of 


a particular model part in development. The key idea is 
that the immutable model partitions restrict the set of all 
model instances, so that the found instances are easier to 
comprehend (as they share properties determined by the 
immutable model partitions) and due to the implemen- 
tation of the Alloy Analyzer also quicker to find (since 
the underlying SAT solver does not háve to solve the 
whole formula but is given a partial solution based on 
the immutable partitions). In fact, the respective partiti¬ 
ons of the model in development are made immutable 
according to a selected sub-instance. This sub-instance 
can be either artificial (automatically or manually crea- 
ted for the purpose of model immutability only), or de- 
rived from a previously-analyzed instance. 

The possible use cases of this approach are the 
following: 

(a) During development, several instances are found, 
one of them indicating a model flaw. After the 
model is corrected, it is necessary to re-check not 
only the particular erroneous instance but also the 
(in some sense) 'similať instances in order to as- 
sess whether the flaw was actually flxed. After 
making an appropriate sub-instance of the errone¬ 
ous instance immutable, the consequently found 
instances will address the updated parts of the mo¬ 
del. 

(b) After a new partition of the model in develo¬ 
pment was created, it is required to analýze it 
using Teasonable’-example-driven analysis. After 
a suitable sub-instance comprising only the pre- 
vious model partitions (created either from exis- 
ting instance using Alloy’s model finding feature, 
or manually) is made immutable, the consecuti- 
vely found instances will introduce variability in 
the new model partition only and will thus fa- 
cilitate its analysis. Moreover, they will share a 
common structure determined by the fixed sub- 
instance which would make them more compre¬ 
hensive. Obviously, to achieve required level of 
confidence, the process has to be repeated for se¬ 
veral immutable sub-instances. 

In summary, our contribution is: 

• Experiments. We present a real-life čase study 
that demonstrates the obstacles of example-driven 
modeling of complex systems and outlines the ap¬ 
proach to address them. 

• Multilevel modeling. We introduce the idea of 
extending the basic example-driven analysis ap- 
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proach by systematically fixing a particular parti- 
tion of the model in development in order to find 
constrained instances and thus allowing easier 
analysis and debugging of the model. We refer this 
method as to multilevel modeling. Moreover, we 
propose the necessary tool support and associated 
workflow for the multilevel modeling method. 


The rest of the páper is structured as follows: Section 2 
presents a real-life čase study based on formal model of 
AADL, Section 3 presents a brief outline of the multi¬ 
level modeling method and proposes the associated too- 
ling, Section 4 surveys the related work, and Section 5 
concludes the páper and gives future work remarks. 


2. Čase Study 

To illustrate the mentioned obstacles of example-driven 
analysis in context of large-scale complex systems, we 
present a čase study based on an Alloy specification of 
the AADL [7] modeling language which is an industrial 
standard for modeling embedded and real-time compo- 
nent systems. 

2.1. Interna! Structure of the AADL Model 

The presented Alloy model of AADL inherently consists 
of several distinct parts representing different concerns 
and different levels of abstraction. 



abstract sig Component { 

in_ports: set AbstractPort, 
out_ports: set AbstractPort, 
subcomponents: set Component, 


} 

fact ComponentNotInSubcomponents { 

all c:Component | c not in c.^subcomponents 

} 

sig System extends Component { ... } 
fact SystemSubcomponents { 

System.subcomponents in 

(System + Data + Process) 

} 


— In valid systém only a Sytém component can be 

— the top level component 
před ValidSystem { 

all c:(Component - System) | 
one pc:Component I 

c in pc.subcomponents 



Figuře 2: AADL compositional model instance. 

Specification of Component Bindings. Based on the 
structural part, a compositional part is specified. This 
part comprises specification of port types, valid com¬ 
ponent composition, and component bindings. Seman- 
tically, this part determines valid application architectu- 
res. The following code example contains a partial spe¬ 
cification of AADL port types and basic requirements 
on a valid systém architecture. Figuře 2 shows a simple 
instance of the compositional model part. 


Figuře 1: AADL structural model instance. 

Specification of Component Structure. The base mo¬ 
del part defines the component structure of the AADL 
and its semantics. This part comprises definition of 
ports, components, component properties, and compo¬ 
nent hierarchy. Except for structural component defini¬ 
tion, this model part also determines how a valid compo¬ 
nent refinement can look like, what properties a particu¬ 
lar component type can háve, etc. The following code 
example contains a partial specification of AADL com¬ 
ponent structure including ports and component refine¬ 
ment, specification of System component type, and basic 
requirements on valid systém structure. Figuře 1 shows 
a simple instance of the structural part. 


abstract sig Port extends AbstractPort { 
connection: set Port 

} 

— A valid connection is either 
fact { 

all p:connection.Port | some parent:Component I 
all dp:p.connection | 

( 

— in-port delegation from parent 

— to a subcomponent 

some cmp:parent.subcomponents I 
( 

p in parent.in_ports 
and dp in cmp.in_ports) 
out-port delegation from 

— a subcomponent to to parent 
) or ( 

p in cmp.out_ports 

and dp in parent.out_ports 

) 
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) or ( 

— or connection of two subcomponents 

— from out-port to in-port 
some disj cmpl, 

cmp2:parent.subcomponents I 
p in cmpl.out_ports 
and dp in crap2.in_ports 

) 

} 



Figuře 3: AADL reconfiguration model instance. 


srcMode: one Mode, 
dstMode: one Mode, 
owner: one Component 

} { 

— Only transitions between modes from 
— the same component are allowed. 
owner = srcMode.owner 
owner = dstMode.owner 

— Only ports from the owner component 
-- can be triggered. 
triggerEventPort in owner.ports 


The AADL standard also introduces other model parts 
such as the platform-definition part or control-and-data- 
flow part. These are left out from this čase study for sim¬ 
plicity. 



Port2 PortS PortO 


Specification of Architecture Reconflgurations. The 

third model part is the architecture reconfiguration spe¬ 
cification. This part comprises specification of valid ar¬ 
chitecture changes in a systém. In AADL, architecture 
changes are modeled using Finite State Automata (FSA) 
where each statě represents a valid systém architecture. 
These States are called architectural modes since they 
represent possible versions/modes of the systém archi¬ 
tecture. Further, the reconfiguration specification cap- 
tures the transitions between the modes and conditions 
that háve to be met in order to trigger the transitions. The 
following code example contains a partial specification 
of AADL modes and mode transitions, as well as their 
hasič properties. Figuře 3 shows a simple instance of the 
reconfiguration model. 


sig Mode extends AbstractMode { 

active_components: set Component, 
connections: Port -> Port, 
owner: one Component 

}{ 

active_components in owner.subcomponents 

-- A mode can refer to existing connections 
-- between currently active components only 
all pl, p2: Port I (pl->p2) in connections => ( 
(p2 in pl,connection) and 
(pl+p2) in 

{owner.ports + active_components.ports) 

) 

} 


Figuře 4: Uninteresting variability in structural part. 


2.2. Analysis of the AADL Model 

Using the čase study presented in Section 2.1, we can 
illustrate the drawbacks of the example-driven model 
analysis which were listed in Section 1. 



sig ModeTransition extends AbstractModeTransition { 
triggerEventPort: set Port, 


Figuře 5: Complex instance of the AADL reconfiguration 
model. 
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As for (a), while analyzing a more-elaborate version of 
the reconfiguration model part, the Alloy Analyzer will 
likely find complex random instances which are hard to 
interpret (Figuře 5). Moreover, each time the model is 
updated, the consecutively found instance will probably 
represent different reconfiguration scenario based on a 
different architectures (e.g., the actual architecture in Fi¬ 
guře 5 significantly differs from the one in Figuře 3). 

As for (b), the found instances in the instance sequence 
will likely introduce variability in uninteresting model 
parts such as variability in port specification while ana¬ 
lyzing compositional specification (Figuře 4). Naturally, 
this increases the complexity of analyzing large instan¬ 
ces such as the one in Figuře 5. 

As for (c). Table 1 illustrates the growth of performance 
demands while developing the individual layers of the 
AADL model. Measurements were performed on a par- 
tial AADL specification similar to the one illustrated in 
previous section bounded by at most 10 objects for each 
Alloy signatuře. The measurements do not include the 
time needed for generation of the associated SAT for- 
mula. It is clear that while developing the full-fledged 
versions of all model parts, the example-driven analysis 
would be greatly impeded by the growth of performance 
demands. 



Min 

Max 

Average 

Structural part 

28 

71 

37.8 

Compositional part 

135 

228 

145.6 

Reconfiguration part 

397 

439 

410 

Reconfiguration part 
with a fixed sub-instance 

40 

68 

46.5 


Table 1: Time required for finding an instance of the indivi¬ 
dual model parts in ms. 

3. Solution 

As mentioned in the first section, the multilevel mode¬ 
ling method, targeting example-driven analysis of com¬ 
plex models, is based on a semi-automated process of 
making a particular model partition immutable accor- 
ding to a selected sub-instance. Not only the immutable 
model partition makes the found instances more com- 
prehensive (as it enforces certain properties and patterns 
to be shared by all the instances), but it also reduces the 
time complexity of the model finding process (as the im¬ 
mutable partition represents a partial solution of the as¬ 
sociated SAT formula). As an aside, it may be necessary 
to repeat the process for several sub-instances in order 
get the required level of confidence in results of the mo¬ 
del analysis. 


As an example, in context of the reconfiguration mo¬ 
del part presented in the case-study, it is beneficial to 
fix a particular systém architecture (set of architectu¬ 
res) and analýze the reconfiguration model part strictly 
on this particular architecture (set). Not even it would 
allow interpreting the found instances and traversing the 
sequence of instances more easily (as they will be all 
based on the shared architecture), but the instance fin¬ 
ding will also require significantly less time (an example 
with an immutable component composition featuring 
5 components, 10 ports, and 7 connections is illustrated 
in Table 1, rows 3-4). 

3.1. Method Description 

The multilevel modeling method consists of several 
steps: (i) Identification of the model partitions and their 
dependencies, (ii) acquisition/creation of a suitable sub- 
instance, and (iii) making a model partition immutable 
according to the sub-instance. In generál, all these steps 
can be performed manually; however, we aim to make 
them as much automated as possible. In this section, we 
will outline the possible methods of targeting these indi¬ 
vidual steps. 

Identification of model partitions and their depen¬ 
dencies. The Alloy Analyzer allows splitting the speci¬ 
fication into several separate modules and declaring de¬ 
pendencies between these modules. We will exploit this 
mechanism to address the task of identification of mo¬ 
del partitions. A model partition is therefore recognized 
as an Alloy module. For more fine-grained partitioning 
of Alloy models, it is possible to introduce independent 
definition of model partitions and their dependencies for 
example by explicitly enumerating the relevant Alloy 
constructs included in each partition. 

Acquisition of a suitable sub-instance. An immutable 
model sub-instance can be obtained in several ways. 
First, it can be constructed manually. Second, it can be 
extracted from a previously-found and analyzed instance 
with assistance of a specialized tool. Third possibility is 
to employ methods adopted by the related Alloy model 
synthesis approaches - for example in [5], Java object 
structures are interpreted as Alloy model instances. In 
our preliminary experiments, we háve employed a vari¬ 
ant based on the second čase. 

Making a model partition immutable according to a 
sub-instance. There are basically two possible methods 
of making a model partition immutable. The former is 
an explicit creation of an Alloy specification which en¬ 
forces the particular sub-instance. The latter is to use the 
Kodkod API [9,10], which allows to provide the under- 
lying SAT solver with a particular sub-instance directly. 
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Although the usage of the Kodkod API is considered 
more effective, the former approach brings noticeable 
performance improvement as well (Table 1, rows 3-4). 
Since an Alloy specification is transformed into a SAT 
formula, making a sub-instance immutable corresponds 
to assigning some of the variables of the corresponding 
SAT formula. Technically, the Alloy specification enfor- 
cing a particular sub-instance causes immediate assig- 
nment of some of the SAT variables (during BCP in 
DPLL-based [1] SAT sol verš [2]). In our preliminary 
Work, we háve adopted the former method because of 
its comprehensibility and implementation simplicity. 

3.2. Tooling and Workflow Proposal 

In this section, we propose required workflow and too¬ 
ling for the automated multilevel modeling support in 
the Alloy Analyzer. 

As the Identification of model parts is based on the Alloy 
modules, it is not necessary to provide any additional 
support for this step, since the Alloy library already pro- 
vides all the required support. Since no user interaction 
is required in this step, it does not influence the work¬ 
flow. 

Concerning acquisition of suitable sub-instances, the 
idea is to integrate the support for selection of sub- 
instances directly into the current Alloy instance visu- 
alization tool. The selection of immutable sub-instance 
can be then performed either directly, by selecting the 
immutable parts of the current instance, or indirectly, by 
selecting the variable parts of the current instance (eve- 
rything except the selected parts is considered fixed). It 
is also necessary to support storage of the selected sub- 
instances. 

As for making a sub-instance immutable, the Alloy 
specification enforcing a sub-instance can be genera- 
ted from the sub-instance XML description provided 
by the Alloy Analyzer. The specification can be auto- 
matically combined with the originál model so that the 
set of possible instances will be constrained to the ones 
sharing the sub-instance. The only change of the work¬ 
flow caused by this step is the selection of a stored sub- 
instance before the actual instance finding. A prototype 
of the supporting tool is currently being developed. 

3.3. Applicability of the Method 

The proposed method is applicable only for a particu¬ 
lar class of Alloy models, i.e. models with an inherent 
internal structure. In generál, the model is required to 
be composed from inter-dependent parts (one part can 
employ constructs defined in another part), where the 
dependencies between the individual model parts háve 


to form an acyclic oriented graph (typical čase is a laye¬ 
red model structure, i.e., a hierarchy). 

For example, as illustrated in Section 2, every common 
component model is eligible for the multilevel modeling 
method. In generál, inherent structure of a component 
model comprises three parts: a structural specification 
of components (e.g., provided/required interfaces, inter¬ 
face types), a compositional specification (e.g., compo¬ 
nent bindings), and a specification of architectural chan- 
ges (e.g., reconfigurations of component bindings). 

Model parts typically represent the individual concerns 
and levels of abstraction of the modeled systém. More- 
over, these parts are typically developed separately in a 
particular order (e.g., in čase of a component systém the 
parts are developed in the same order as listed in the pre- 
vious paragraph). 


4. Related Work 

In [5], the authors use a set of instances to automatically 
generate an executable Alloy model, which represents 
the properties shared by the instances. The instances are 
determined by given Java object structures. In compa- 
rison with the multilevel modeling method, this work 
employs the given instances to completely synthesize 
a fitting model, whereas in our čase the instances are 
employed for synthesis of an extension of an existing 
model. In both cases, the goal is to enforce the shared 
properties of the instances in the resulting Alloy model. 

In [6], the author presents a method called Modeling by 
Example, based on automated model refinement. MBE 
generates near-hit and near-miss examples for the user 
to decide whether they should be included or excluded 
in the target Alloy model. The model is continuously 
refined according to these choices. In some sense, the 
MBE method represents an enhancement of the originál 
example-driven model analysis by automated Proces¬ 
sing of the user decisions. Similar to the multilevel mo¬ 
deling method, the goal is to generate instances which 
carry meaningful information with respect to the model 
in development (i.e., instance without uninteresting vari¬ 
ability). However, in context of multilevel modeling the 
'uninteresting’ instances háve to be eliminated manually 
and the method itself serves only for easier Identification 
of such instances. 

In [8], the goal is to speed-up the Alloy model verifi- 
cation with respect to a particular property by separate 
verification of programmatically-identiíied model parts 
and composition of the verification results. Compared to 
the proposed method, the method in [8] is rather tool- 
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oriented than model-developer-oriented; i.e., it aims for 
speeding-up an automated process, whereas our appro- 
ach aims for speeding-up a manual process. Still, both 
methods exploit separate analysis of individual model 
parts. 


5. Conclusion and Future Work 


Alloy has been gaining popularity due to its ability to 
provide rapid feedback - by employing the example- 
driven model analysis. However, development of Alloy 
models of large-scale complex systems introduces vari- 
ous problems which are reducing this feedback. In this 
páper, we háve described these problems. We háve also 
provided a real-life čase study for illustration. To fi- 
ght these problems, we proposed a simple developer- 
oriented method, applicable for specific class of mo¬ 
dels (i.e., models with an inherent internal structure), 
based on making particular model parts immutable in 
order to allow constrained example-driven analysis of 
the model part of interest. The proposed method allows 
focusing on development of a particular model part and 
thus increases the feedback of the example-driven ana¬ 
lysis. Currently, we are at the stage of implementing the 
hrst prototype of a tool which integrates the multile- 
vel modeling into the standard Alloy Analyzer’s work- 
flow. The tool is based on programmatic transformation 
of an XML description of a given sub-instance into a 
corresponding Alloy specification. As a future work, we 
pian to implement a more elaboráte tool which would 
allow quick and Interactive diagram-based construction 
of sub-instances both from scratch and from previously 
found instances (an interesting option is to allow se- 
lection of the variable parts in a given instance instead of 
the immutable parts). We also pian to employ the Kod- 
kod API to enforce the given sub-instances. 
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Abstrakt 


Článek pojednává o problematice detekce 
zvýšených ztrát v distribuční soustavě. Je pre¬ 
zentována problematika přepravy, měření a od¬ 
hadu spotřeby zemního plynu ve světě, s 
důrazem na situaci v ČR. Dále je prezentován 
projekt, v rámci něhož je zkoumána problema¬ 
tika detekee ztrát na vybraných uzavřených lo¬ 
kalitách v rámci distribuční sítě RWE GasNet, 
s.r.o. 


Úvod 

Pro obchodníky se zemním plynem představují ztráty 
v distribuční soustavě ztráty finanční. Proto je pocho¬ 
pitelné, že je snaha ztráty minimalizovat. Bohužel, jak 
bude podrobněji popsáno v dalších odstavcích, jsou 
v případě zemního plynu ztráty prakticky neměřitelné. 
Přesné (nebo přinejmenším uspokojivě přesně) změření 
výše ztrát by znamenalo osazení všech vstupů a výstupů 
v síti podrobným průběhovým měřením, například 
v denním rozlišení. Toto řešení je však v souěasné době 
velice nákladné (obnáší jednorázové náklady na pořízení 
měřidel, ale také dlouhodobé náklady na průběžné zpra¬ 
cování obrovského množství naměřených ůdajů). Z to¬ 
hoto důvodu je obvyklé využití matematických mo¬ 
delů spotřeby plynu. V dalším textu je naznačena 
situace v ČR, popsán způsob přepravy a distribuce 
zemního plynu, měření jeho spotřeby a odhad spotřeby 
pomocí matematických modelů. Dále je popsán pro¬ 
jekt, jehož cílem je detekovat zvýšené ztráty ve vy¬ 
braných uzavřených lokalitách v rámci distribuční sítě 
společnosti RWE GasNet, s.r.o. V rámci tohoto pro¬ 
jektu jsou vyvíjeny metody využití níže popsaných ma¬ 
tematických modelů k identifikaci oblastí se zvýšenými 
ztrátami. 


1. Přeprava a distribuce zemního plynu v ČR 
1.1. Zdroje zemního plynu 

V Českě republice existují čtyři typy „zdrojů" zemního 
plynu. Jedná se o dovoz, vlastní těžbu, zásobníky a 
akumulaci. Hlavním zdrojem plynu je dovoz, ostatní 
zdroje jsou spíše podpůrně (vlastní těžba pokrývá 
dle Plynárenská příručky [25] řádově jednotky pro¬ 
cent spotřeby v ČR). Průtok importovaného plynu je 
v průběhu roku relativně stálý, naproti tomu nerov¬ 
noměrnost spotřeby v rámci rokuje obrovská. To je dáno 
skutečností, že nezanedbatelná část zemního plynu je 
spotřebovávána pro účely vytápění. Spotřeba v zimě pak 
činí až desetinásobek letní spotřeby. 

Tyto výkyvy ve spotřebě je potřeba vyrovnávat. 

V zásadě existují dvě možnosti: 

1. nerovnoměrná těžba zdrojů, 

2. využití zásobníků. 

V ČR pochopitelně připadá v úvahu pouze možnost 
2 vzhledem ke zmíněné zanedbatelnosti vlastních 
těžebních zdrojů. Zásadní roli při vyrovnávání ne¬ 
rovnoměrnosti spotřeby v průběhu roku tedy hrají 
zásobníky. Plynárenská příručka [25] rozlišuje dva typy 
zásobníků podle způsobu využití: 

sezónní zdroje - ty se vyznačují velkou uskladňovací 
kapacitou, těžba, resp. vtláěení, je však u těchto 
zásobníků relativně pomalá; jedná se především 
o tzv. aquiferové zásobníky (přírodní zásobníky 
podzemní vody jsou využity pro skladování 
plynu) a vytěžená ropná či plynová ložiska, 

špičkové zdroje - slouží především ke krytí špičkové 
spotřeby (nejchladnější dny), ale také k vy¬ 
rovnávání krátkodobých výkyvů; vyznačují se 
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menší skladovací kapacitou, ale rychlejší těžbou; 
jedná se např. o solné či umělé kaverny, ply¬ 
nojemy, zásobníky zkapalněného plynu (LNG - 
v ČR nevyužíváno). 

Zvláštním typem zásobníku je akumulace plynu v po¬ 
trubí. Akumulace soustavy je definována jako cel¬ 
kový objem plynu v přepravní soustavě. Vzhledem 
k plynnému skupenství může tento objem (a spolu s ním 
i tlak) poměrně výrazně kolísat. Toho lze do jisté míry 
využít k regulaci nerovnoměrnosti spotřeby v rámci dne. 

1.2. Přeprava zemního plynu 

Ve světě existují dva typy přepravy zemního plynu: 

1. potrubní přeprava, 

2. přeprava zkapalněného plynu (LNG) tankery. 

V ČR je pochopitelně využíván pouze první způsob 
přepravy. Přeprava pomocí tankerů je však i ve světě až 
na výjimky kombinována s potrubní přepravou. 

Přepravní síť zemního plynu v České republice se skládá 
z několika kategorií plynovodů: 

nízkotlaké plynovody - pracovní tlak do 5 kPa včetně, 
slouží pro domovní rozvody případně distribuci 
v menších obcích, tlak plynu není třeba před 
použitím ve spotřebiěích dále upravovat, 

středotlaké plynovody - pracovní tlak do 400 kPa 
včetně, využívají se při potřebě vyšší kapacity a 
pružnosti sítě (při připojení na středotlaký roz¬ 
vod si musí odběratel opatřit regulátor k úpravě 
tlaku plynu na hodnotu potřebnou pro provoz 
spotřebičů), 

vysokotlaké plynovody - pracovní tlak do 4 MPa 
včetně, slouží především pro vnitrostátní 

dálkovou dopravu plynu do jednotlivých plyno- 
fikovaných obcí, 

velmi vysokotlaké plynovody - pracovní tlak nad 
4 MPa, slouží především pro mezinárodní 
dálkovou dopravu plynu. 

Dovážený plyn z Ruska vstupuje na území ČR 
v předávací stanici Lanžhot. Ze systému dálkové 
přepravy se dostává zemní plyn (ta ěást, která není ex¬ 
portována do dalších zemí) přes předávací stanice vnit¬ 
rostátní soustavy. V těchto předávacích stanicích se také 
upravuje tlak plynu na hodnotu obvyklou v dané vnit¬ 
rostátní síti. Vnitrostátní sítí je zemní plyn dopravován 


do měst a obcí, případně přímým odběratelům (typicky 
velkým průmyslovým podnikům). 

1.3. Měření spotřeby zemního plynu 

Údaje o množství přepraveného plynu jsou pochopitelně 
důležité jak pro provozovatele distribuční sítě (poplatky 
za přepravu), tak pro obchodníky s plynem (představa 
o množství spotřebovaného plynu). Předávací stanice 
jsou osazené průběhovým měřením, které poskytuje ho¬ 
dinové hodnoty množství plynu, který těmito body pro¬ 
tekl. Spotřeba plynu jednotlivých odběratelů je taktéž 
měřena, aby mohla být na základě měření provedena 
fakturace odebraného plynu a poplatků za přepravu. 
Za předpokladu konstantní akumulace (jejíž změny lze 
odhadnout pomocí měření tlaku v předávacích bo¬ 
dech) odpovídá rozdíl mezi vstupním a spotřebovaným 
množstvím plynu ztrátám v daném úseku distribuční 
soustavy. 

Tyto ztráty však nelze v žádném okamžiku přesně 
vypočítat, a to ani když zanedbáme vliv akumu¬ 
lace. Problém je na straně měření spotřeby. Jednot¬ 
livá odběrná místa jsou (zpravidla dle průměrné roční 
spotřeby) osazena jedním ze tří typů měřidel [30]: 

měřidlo typu A - průběhové měřidlo s dálkovým 
přenosem, naměřené hodinové hodnoty jsou 
průběžně odesílány na dispečink, 

měřidlo typu B - průběhové měřidlo bez dálkového 
přenosu, naměřené hodinové hodnoty jsou zpra¬ 
vidla jednou měsíčně ručně vyčítány pomocí 
přenosného zařízení, 

měřidlo typu C - bez průběhového měření, je za¬ 
znamenáno pouze celkové množství odebraného 
plynu, to je v určitých intervalech (jeden až 
osmnáct měsíců) odečítáno, spotřeba se určí 
rozdílem dvou po sobě jdoucích odečtených hod¬ 
not. 

Denní hodnoty ztrát by bylo možno přesně určit pouze 
v případě, že by všechna odběrná místa byla osazena 
měřením typu A, případně měřením typu B (pak by 
bylo možno ztráty určovat zpětně vždy na konci ka¬ 
lendářního měsíce). Osazení tímto typem měření je však 
velmi nákladné (pořizovací cena měřidla se pohybuje 
v desítkách tisíc korun). Proto je většina odběrních míst, 
především u domácností a maloodběratelů (tj. zákazníků 
s ročním odběrem do 630 MWh), ale i některých 
středních odběratelů (zákazníků s ročním odběrem mezi 
630 a 4200 MWh), osazena měřením typu C. 

K získání údajů o spotřebě z měřidla typu C je za¬ 
potřebí provést fyzický odečet. Ten spočívá v odečtení 
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stavu plynoměru pověřeným pracovníkem a zápisu do 
databáze. Pořízení fyzických odeětů všech zákazníků 
v jednom dni je velmi technicky i ekonomicky náročné. 
Z tohoto důvodu většina dodavatelů plynu v České 
(ale i Slovenské) republice přistoupila k tzv. cyklickým 
odečtům. V každém měsíci daného roku je odečtena 
určitá část zákazníků (rozložení není zcela rovnoměrné 
s ohledem na obtížnější přístup k plynoměrům v letních 
měsících z důvodů častých dovolených). Při plánování 
odečtových tras přitom musí být zajištěno, aby interval 
mezi dvěma odečty daného zákazníka nebyl delší než 
18 měsíců. Typicky je délka intervalu mezi jednotlivými 
odečty roční, u větších zákazníků však není výjimkou 
ani měsíční odečítání. 

Důsledkem cyklického odečítání je skutečnost, že 
v žádném okamžiku není přesně znám celkový ob¬ 
jem spotřebovaného plynu v daném úseku distribuční 
soustavy. Z toho plyne, že nelze přesně určit ani 
ztráty. Pomineme-li v současné době ekonomicky 
neprůchodnou variantu osazení všech zákazníkům 
měřením typu A, je pro vybrané úlohy (určení ztrát, 
množství nevyfakturovaného plynu, obchodní bilance 
atd.) nutné použít odhad. V následující kapitole je 
popsáno několik matematických modelů používaných 
k odhadu spotřeby zákazníků s měřením typu C 
v různých situacích. 

2. Modelování spotřeby zemního plynu 

Vzhledem ke komplikacím při distribuci zemního plynu 
popsaným v kapitole 1 (rovnoměrný dovoz, nerov¬ 
noměrný odběr, obtížné skladování, nákladné měření) 
je modelování spotřeby zemního plynu velmi důležitým 
nástrojem distribučních společností. Modely popsané 
v literatuře mají několik společných znaků. Typicky je 
součástí modelu klasifikace zákazníků na základě je¬ 
jich odběratelského chování. Nejjednodušší formou je 
klasifikace podle výše odběru (např. maloodběratel - 
střední odběratel - velkoodběratel, případně jemnější 
ceníková pásma), dále se ale také používají klasifikátory 
jako je způsob užití zemního plynu (vaření, ohřev vody, 
vytápění, technologický odběr) případně typ podniku 
(výrobní prostory, služby, zemědělství atd.) a další. 
Volba vhodné klasifikace je náročná a závisí mimo jiné 
také na technických možnostech konkrétní distribuční 
společnosti a správnosti údajů v její databázi. Výsledné 
třídy by měly být pokud možno homogenní v ročním 
průběhu spotřeby, zároveň by však měly být od sebe 
zřetelně odlišitelné. Nezanedbatelným požadavkem je 
také dostatečné zastoupení zákazníků v každé třídě. 

Modely popsané v literatuře využívají nejrůznější 
vysvětlující proměnné. Typicky bývají zahrnuty meteo¬ 


rologické veličiny. Již v práci [1] byla popsána závislost 
spotřeby na teplotě. Uvažovány jsou však i další me¬ 
teorologické veličiny, jako je rychlost a směr větru, 
sluneční svit, atmosférický tlak či srážky. Teplotní ode¬ 
zva bývá často velmi komplexní, minimálně se uvažuje 
rozdílná teplotní závislost pro teplé a studené dny. Exis¬ 
tuje koncepce tzv. ,Jieating degree days“, která spočívá 
v zanedbání vlivu teploty na spotřebu nad určitou sta¬ 
novenou mezí [12, 27]. Podobná koncepce je použita 
například i u modelu GAMMA používaného v ČR 
[13,28]. Dalšími běžně používanými prediktory jsou ka¬ 
lendářní efekty jako je den v týdnu, resp. rozlišení na 
pracovní a nepracovní den, svátky, vánoce, velikonoce 
a další. Používány jsou také jiné veličiny, jako je např. 
cena plynu, resp. ropy, s níž je cena plynu vázaná [24]. 

Dále je možné dělit modely podle způsobu jejich 
využití: 

1. Odhad individuální spotřeby: 

a) rozpočet známé spotřeby do kratších časových 
úseků (například při změně ceny plynu, ke které 
došlo mezi dvěma řádnými odečty měřidla typu 
C), 

b) odhad spotřeby za určitý časový úsek v minulosti 
(například v případě, kdy je ze zákona nutno vy¬ 
stavit zákazníkovi fakturu, ale nejsou k dispozici 
údaje z měření, pak je třeba odhadnout spotřebu 
od posledního odečtu do okamžiku fakturace), 

c) předpověď individuální spotřeby v budoucnu. 

2. Odhad spotřeby větších skupin zákazníků: 

a) odhad nevyfakturovaného plynu (celkové 
množství plynu, které bylo spotřebováno, ale ne¬ 
bylo fakturováno), 

b) bilance v soustavě (odhad ztrát či jejich rozpočet 
mezi jednotlivé účastníky trhu), 

c) predikce celkové spotřeby celého zákaznického 
kmene. 

Rozpočet (případ la) je úloha s největší dostupností 
potřebných údajů. Z tohoto důvodu je svým způsobem 
jednodušší než zbývající úlohy. Problémy Ib a Ic 
se liší především v dostupnosti hodnot vysvětlujících 
proměnných, např. průměrné teploty vzduchu. Ve 
světové literatuře je nejvíce publikací věnováno predikci 
celkové spotřeby (2c) [9, 10, 12, 17, 21-24], méně pak 
tvorbě tzv. typových diagramů dodávky (TDD), tj. úloze 
2b [11]. 

Různé úlohy pochopitelně mohou někdy být řešeny po¬ 
mocí stejného modelu, obecně však nelze zkonstruovat 
model, který optimálně řeší všechny. Použití jednotného 
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přístupu k řešení různých problémů je vždy kompromi¬ 
sem. Nicméně vzhledem k přirozené tendenci minimali¬ 
zovat náklady je tento požadavek v praxi běžný. 

K řešení úlohy la se dříve v ČR používala tzv. oto¬ 
pová křivka [29]. Jednalo se o sadu dvanácti měsíčních 
koeficientů, která udávala empiricky zjištěný podíl 
daněho měsíce na spotřebě plynu při vytápění. Pro 
zákazníky s malou spotřebou, u nichž se vytápění 
plynem nepředpokládalo, se rozpočet prováděl rov¬ 
noměrně. V dnešní době se používá k rozpočtu používá 
model TDD [30]. Ten se nyní užívá i k náhradě odečtu 
(úloha Ib). Úloha predikce individuálních spotřeb (Ic) 
byla řešena pro zákazníky s měřením typu B např. v pu¬ 
blikaci [3]. 

Pro odhad nevyfakturované složky plynu (2a) byl zkon¬ 
struován nelineární regresní model GAMMA [8, 13, 
28]. Pro účely zúčtování odchylek (2b) slouží poněkud 
složitější model TDD třídy GAM (generalized additive 
models) [4,5,30]. Řešení problematiky odhadu ztrát se 
v ČR teprve připravuje. Predikcí celkově spotřeby (2c) 
se zabýval např. systěm ELVÍRA [18, 19], který byl 
založen na metodách analýzy časových řad. 

V následujících odstavcích jsou podrobněji popsány mo¬ 
dely GAMMA a TDD pro odhad spotřeby zemního 
plynu vyvíjené v Úl AV ČR ve spolupráci s účastníky 
trhu s plynem v ČR. 

2.1. Model GAMMA 

Model GAMMA [13,28] byl primárně určen pro od¬ 
had nevyfakturovaněho plynu [8]. O možném využití 
k dalším praktickým úlohám pojednává článek [2]. Je 
založen na odhadech individuálních spotřeb, které jsou 
posléze agregovány po stanovených třídách zákazníků. 
Celkové množství nevyfakturovaného plynu se určí jako 
součet individuálních odhadů zákazníků v dané zóné. 
Model byl v letech 2004 až 2009 rutinně používán 
v Západočeské plynárenské, a.s. (dále jen ZČP). Od 
roku 2010 přebrala model firma RWE GasNet, s.r.o. 
(dále jen RWE), která uvažuje o jeho použití pro odhad 
ztrát v tzv. uzavřených lokalitách. Tato problematika je 
podrobnéji popsána v kapitole 3. 

Odbérná místa jsou klasifikována dle typu klienta 
(domácnost, maloodbér) a dle charakteru odběru 
(vaření, ohřev vody, vytápění, technologický odběr). 
Uvažovány jsou všechny kombinace prvních tří charak¬ 
terů (celkem 7 tříd pro domácnosti a 7 tříd pro malo- 
odběr) a dvě třídy pro technologický odběr (čistě tech¬ 
nologický a v kombinaci s vytápěním). Celkem máme 
tedy k dispozici 16 zákaznických tříd. Vybrané parame¬ 
try modelu jsou společné všem zákazníkům dané třídy. 


Základním časovým rozlišením modelu GAMMA je 
den, typické použití je však pro odhad spotřeby za 
delší časové období (1 až 18 měsíců). Jako vysvětlující 
proměnné se používají průměrné denní teploty vzduchu 
v daném regionu a také dlouhodobý teplotní normál. 
Model je relativně jednoduchý s ohledem na to, že 
pro odhad parametrů nejsou a nikdy nebyla k dispo¬ 
zici denní data. Pro odhad parametrů se využívá kom¬ 
binace mimořádných měsíčních odečtů náhodně vy¬ 
braných cca 1700 zákazníků a dále údajů z řádných 
(zpravidla roěních) odečtů celěho zákaznického kmene 
(očištěného od podezřelých hodnot). 

Pro spotřebu zákazníka i třídy k ve dni d je defi¬ 
nován následující nelineární regresní model: 

^ikd — l^ik^kd ^ikdy ( 1 ) 

kde 

Hik je individuální parametr zákazníka i určující 
globální (časově nezávislou) hladinu jeho 
spotřeby, 

^kd je systematická část modelu, společná pro třídu k, 

Sikd je náhodná složka, u níž předpokládáme nulovou 
střední hodnotu a rozptyl úměrný střední hodnotě 
denní spotřeby, tj. členu Hik^kd- 

Individuální parametr je odhadován metodou 
vážených nejmenších čtverců s využitím řádných odečtů 
daného zákazníka až tři roky do historie. Délka 
využívané historie je také jedním z parametrů modelu 
specifických pro zákaznickou třídu. V minulosti bylo 
také experimentováno s odhadem parametru ^ik pomocí 
metod NLME (nelineární modely s smíšenými efekty) 
[14, 15]. Tento přístup však nakonec nebyl v provozu 
využit z důvodu výrazně vyšší výpočetní náročnosti. 

Systematická část ^kd má následující tvar: 

^kd = Zkd{'^d-exp{-'ykf{Td,Nd)}+Pk) 

“t“(l Zkd')Qk: 

( 2 ) 

kde 

Zkd je indikátorová proměnná, je rovna 1, jestliže 
je průměrná teplota ve třech posledních dnech 
(d, d—l, d—2) nižší než 14° C (indikuje zimní ob¬ 
dobí) nebo je třída k tzv. ,yieotopová“ (zákazníci 
nevyužívají plyn k vytápění), v opačném případé 
má Zkd hodnotu 0, 

je sezónní složka s roční periodicitou společná všem 
třídám, 

jk je parametr udávající míru teplotní závislosti. 
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f{Td,Nd) je funkce průměrné denní teploty a 
normálové teploty Nd tvaru 

/ {Td, Nd) = Td — Nd, (3) 

Pk je stálá (nesezónní) složka spotřeby v ,^imním“ ob¬ 
dobí, 

qk je stálá složka spotřeby v , jetním" období (pouze pro 
„otopové" třídy). 

Indikátorová proměnná Zkd slouží k ,přepínání' 
zimního a letního provozu u zákazníků využívajících 
plyn k vytápění. U těchto zákazníků se ukázalo, 
že použití jednotného tvaru po celý rok způsobuje 
neuspokojivou přesnost odhadu spotřeby v letních 
měsících. Použití třídenních průměrů teplot místo 
pevného časového určení částečně řeší problém tzv. 
přechodových období, tj. období začátku a konce topné 
sezóny, které nastávají v každém roce jindy. Tato ob¬ 
dobí jsou zároveň velice citlivá, neboť je zde zvýšená 
variabilita ve spotřebě mezi zákazníky daná rozdílným 
zákaznickým chováním. 

Sezónní složka T*;/ byla jednorázově odhadnuta nepara- 
metricky z měsíčních hodnot vstupu do distribuční sítě 
ZCP. Měsíční hodnoty byly posléze interpolovány po¬ 
mocí polynomu. Tak vznikly denní hodnoty T*;;. Tento 
odhad je následně používán jako vysvětlující proměnná 
v nelineárním regresním modelu daném rovnicemi (1) 
a (2), tj. je pro daný den v roce považován za pevnou 
konstantu. Důvodem tohoto postupu byla skuteěnost, že 
v době návrhu modelu nebyla k dispozici jiná data s do¬ 
statečně jemným časovým rozlišením. 

Tvar teplotní funkce f{Td,Nd) byl zvolen pro jedno¬ 
duchost a snadnou interpretovatelnost. Jak skutečné, 
tak normálové teploty jsou však před výpočtem shora 
ořezány v hodnotě 14°C. Důvodem je experimentální 
zjištění, že přibližně v této hodnotě zaniká teplotní 
závislost spotřeby. 

Ostatní parametry modelu, tj. 'yk, Pk a qk jsou od¬ 
hadovány metodou nejmenších absolutních odchylek 
s využitím údajů z mimořádných měsíčních měření. 
V případě nedostatečného množství mimořádných 
měření se použijí řádné odečty. Tato metoda byla 
zvolena, neboť poskytovala stabilnější odhady než 
tradičnější metoda nejmenších čtverců. 

Modularita modelu umožňuje provádět dílěí změny bez 
výraznějšího narušení struktury, a tudíž i procesu od¬ 
hadu parametrů. Chceme-li například změnit tvar tep¬ 
lotní odpovědi modelu, lze prostě zaměnit funkci / ji¬ 
nou vhodnou funkcí. Při vývoji modelu byl kladen důraz 


na maximální přesnost odhadu celkového nevyfaktu- 
rováného plynu, tj. odhadu spotřeby relativně velkého 
celku za relativně dlouhé období. Pro použití k detekci 
zvýšených ztrát může být nutné provést určité modifi¬ 
kace. 

3. Detekce zvýšených ztrát 

V současné době probíhá ve spolupráci s distribuční 
společností RWE GasNet, s.r.o., vývoj metodiky detekce 
anomálního časového průběhu ztrát v uzavřené loka¬ 
litě s využitím modelu GAMMA popsaného v odstavci 
2. Příkladem atypického průběhu může být krátkodobé 
zvýšení ztrát (např. při havárii) případně dlouhodobě 
vyšší ztráty (např. při ilegálním odběru). Souěástí pro¬ 
jektuje získání údajů z několika desítek uzavřených ob¬ 
lastí v rámci celé její distribuční sítě (prakticky celá ČR 
s výjimkou Prahy a Jihočeského kraje). 

Uzavřenou lokalitou se rozumí část distribuční sou¬ 
stavy, která má jeden nebo více měřených vstupů a 
výstup pouze u koncových odběratelů. Typicky se jedná 
o několik menších ohcí (řádově 500-1000 odběratelů). 
Příklad takové lokality je v podobě mapy uveden na 
obrázku 1. 



Obrázek 1: Příklad uzavřené lokality. 

Veškerý plyn, který vteče do oblasti by měl být 
spotřebován. Rozdíl mezi spotřebou plynu a přítokem 
do soustavy představuje ztráty v soustavě. Problém je 
v tom, že spotřebu (a tudíž ztráty) na rozdíl od vstupu 
do soustavy nelze v praxi změřit v dostateěném časovém 
rozlišením, jak bylo popsáno v kapitole 1. 

Odhad denních ztrát získáme pomocí odhadu spotřeb 
(např. modelem GAMMA) tak, že od měřeného 
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vstupního objemu v daném dni odečteme měřenou 
spotřebu velkých zákazníků a odhadnutou spotřebu 
malých zákazníků. Výsledkem je pro každou oblast řada 
denních odhadů ztrát. 

3.1. Data 

Pro řešení úlohy jsou k dispozici následující datové sou¬ 
bory: 

1. Data z vybraných uzavřených oblastí: 

a) denní měřené objemy vstupujícího plynu, 

b) denní měřené spotřeby všech velkoodběratelů a 
středních odběratelů, 

c) fakturační odečty všech odběratelů v dané oblasti 
(typicky roční interval). 

2. Průměrné denní teploty v jednotlivých krajích (zdroj 
ČHMÚ) od roku 1999. 


vytipovat oblasti se zvýšeným objemem ztrát a ty 
pak osadit průběhovým měřením, a to ve všech bo¬ 
dech. Primárně bude vyměněno měřidlo na vstupu 
z důvodu vyloučení jeho případné systematické chyby. 
Pokud bude po určeném čase stav přetrvávat, budou 
průběhovými měřidly osazena všechna odběrná místa, 
aby bylo možno ztráty změřit. Poté bude následovat vy¬ 
hodnocení reálných (naměřených) ztrát oproti jejich od¬ 
hadům pomocí modelu GAMMA. 

Úloha detekce ztrát je komplikována několika problěmy, 
které vyžadují nestandardní postupy: 

1. Chyba odhadu není a nikdy nebude pozorována. 
V daném dni je totiž měřen pouze vstup do soustavy, 
nikoli výstup. Spotřeba zákazníků v daném dni je od¬ 
hadována modelem GAMMA. Rozdíl mezi měřeným 
vstupem a odhadnutou spotřebou představuje odhad 
ztrát. 


3. Podpůrná data: 

a) mimořádně měsíční odečty cca 1700 zákazníků ze 
západních Čech, 

b) fakturační odečty zákaznickěho kmene RWE. 


Objem zpracovávaných dat je poměrně rozsáhlý. 
Kromě toho jsou data v poměrně syrovém stavu. 
Zkušenosti s dosavadní implementací modelu GAMMA 
v západních Čechách ukazují, že model může sloužit 
také jako nástroj k automatizovanému hledání chyb 
v datech. To však pochopitelně nevede k odstranění 
všech chyb a je třeba kombinovat s ,fučním“ hledáním. 


Příkladem využití modelu GAMMA pro hledání chyb 
v datech je čištění dat z mimořádných měsíčních odečtů. 
Označíme-li měsíční spotřebu zákazníka i v měsíci 
m, Yim její odhad modelem GAMMA a M cel¬ 
kový počet měřených měsíců, zavedeme pro každého 
zákazníka a každý měsíc penalizaci 


1 




M 


úm - úm| > 0, 75 E Út 


jinak. 


(4) 


Jako podezřelého pak označíme zákazníka, pro kterého 
platí Em=l Pim > ^. Validita měření u podezřelých 
zákazníků je následně ověřována poskytovatelem dat. 


3.2. Řešení 

Hlavním úkolem je porovnávat odhady ztrátových pro¬ 
cesů v jednotlivých měřených oblastech. Cílem je 


2. Do hry vstupuje velmi mnoho proměnných, které 
mohou více ěi méně ovlivnit průběh odhadu ztrát: 

a) různá skladba zákazníků v jednotlivých lokalitách 
(počet i spotřeba), 

b) různě odběratelskě zvyklosti v různých regionech 
(např. různá pracovní doba, různé nároky na tep¬ 
lotu v bytě, v domě apod.), 

c) různě povětrnostní podmínky (chladnější a tep¬ 
lejší regiony), 

d) vlastní chyba použitého matematického modelu a 
její variabilita (v čase i mezi regiony). 

Tyto faktory je třeba brát v úvahu a snažit se odfiltro¬ 
vat jejich vliv před vlastní diagnostikou. 

3. Pojem „zvýšené ztráty" není zcela jasně definován. 
Před hledáním detekčních metod bude třeba dobře 
rozmyslet, co vlastně chceme detekovat. V zásadě 
jde o problém detekce odlehlých pozorování, ovšem 
s tím, že máme dva druhy odlehlostí: 

a) odlehlost uvnitř lokality (např. při krátkodobém 
černém odběru, úniku při havárii apod.), 

b) odlehlost mezi lokalitami (např. při dlouhodobém 
černém odběru, ale takě např. při odlišné skladbé 
zákazníků). 

Není vyloučeno, že bude třeba použít rozdílných me¬ 
tod pro detekci různých typů odlehlostí. 
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4. První výsledky 


Například kritérium 


V současné době jsou k dispozici pro testování údaje z 
9 uzavřených lokalit (v budoucnu by se měl tento počet 
rozšířit na řádově desítky lokalit). Z důvodu důvěrnosti 
používaných dat budou v dalším textu lokality rozlišeny 
číselnými kódy. V těchto lokalitách byla odhadnuta 
spotřeba všech zákazníků modelem GAMMA a po¬ 
rovnána s celkovým vstupem do soustavy (po odečtení 
průběhově měřených zákazníků). 

4.1. Kritérium 

Proces detekce musí být při případném provozním 
použití plně automatický (nelze spolěhat na ruční 
procházení stovek až tisíců průběhů spotřeb). Z toho 
důvodu je klíčová volba kritéria pro hodnocení ztrát 
v dané lokalitě. V zásadě je třeba vytvořit uspořádání 
lokalit podle zvoleného kritéria a posléze vénovat 
zvýšenou pozornost ,.nejhorším“ lokalitám. 

Přirozenou cestou je využívat odchylky modelovaných 
spotřeb od měřeného vstupu (které představují odhad 
ztrát v dané lokalitě). Vzhledem k tomu, že prioritní je 
odhalit ztráty dlouhodobě, je vhodné v první fázi volit 
kritéria zohledňující chování na delším časovém úseku. 

Dále je třeba volit mezi kritérii absolutními a rela¬ 
tivními. Výhodou absolutních kritérií, jako např. 

Ern{l) = ^y^\Ýu-Yu\, (5) 

I ter 


kde 


Ýti je odhad celkové spotřeby neprůběhově měřených 
zákazníků za den t v lokalitě l, 

Yti je měřená hodnota vstupu do lokality l ve dni t 
po odečtení spotřeb všech průběhově měřených 
zákazníků. 


= ( 6 ) 

L 2^i=l z^ter 

kde L je celkový počet vyhodnocovaných lokalit, vede 
ke stejněmu uspořádání lokalit jako kritěrium (5), hod¬ 
noty kritěria jsou však vztaženě k celkové průměrné 
denní spotřebě všech vyhodnocovaných lokalit. Pocho¬ 
pitelně z hlediska uspořádání není hodnota normovací 
konstanty důležitá, normování je použito s ohledem na 
interpretovatelnost výsledků. 

Kritěrium (6) však „znevýhodňuje“ velkě lokality, 
u kterých lze očekávat vyšší odchylky již z toho důvodu, 
že mají celkově vyšší spotřebu. Nejvíce budou tedy pe- 
nalizovány lokality s nejvyššími odchylkami mezi všemi 
lokalitami. Tomu lze předejít užitím kritěria 

Errs (0 = 100 • ^^er\Yti-Yu\ ^ 

kterě posuzuje průměrnou denní odchylku vzhledem 
k průměrné denní spotřebě v daně lokalitě. Toto 
kritérium pochopitelně naopak více penalizuje lokality 
malě, kde je nízký základ díky nízké spotřebě. Nejvíce 
jsou tedy penalizovány lokality s vysokými odchylkami 
v rámci časověho průběhu dané lokality. 


Vzhledem k tomu, že odhad modelem GAMMA má 
pochopitelně takě svou chybu, která je neměřitelná 
a promítá se do odhadu ztrát, lze uvažovat takě 
o kritěriích, která alespoň částečně vliv těto chyby 
potlačují. Například lze zvolit lokalitu A minimalizující 
kritérium 7 a posléze užít kritérium 

Err^il) = ^ V Křtí - Yu) - (VtA - Ytx)\, (8) 

' ' tGr 


resp. jeho relativní podobu 


Errz{l) 


Éter mi - Yu) - (Vu - 
Éter 


(9) 


T je vyhodnocované období (v současné době celé 
měřené období od 1.6.2007 do 31.8.2010), 

t| je počet dní období t, 

je přímá vazba na výši případných finančních ztrát (i 
přes relativně komplikovaný systěm cen plynu pro různě 
zákazníky lze přinejmenším odhadnout, o kolik peněz 
by společnost přicházela za předpokladu, že by odhad 
ztrát byl přesný). Distribuční společnosti však typicky 
zajímá taká vztah k nějakěmu celku. To vede k použití 
různých typů relativních kritárií. 


4.2. Vliv chyby modelu 

K posouzení vlivu chyby modelu byly vedle využití 
výše popsaných kritěrií (8) a (9) provedeny experimenty 
s různými verzemi modelu GAMMA. Kromě poslední 
provozní verze modelu GAMMA určená k odhadu ne- 
vyfakturovaněho plynu v Západočeská plynárenská, a.s. 
(ze září 2009 - ozn. jako verze 2.4.0) byly testovány 
další verze (se stejnou strukturou, odlišně pouze volbami 
při optimalizaci parametrů), a to: 

verze 2.4.1.0 - optimalizace standardním způsobem 
(s využitím dat ZČP), 
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verze 2.4.1.1 - optimalizace s využitím dat ze 
zákaznického kmene RWE GasNet, 

verze 2.4.1.2 - regionální optimalizace parametrů. 


Kritériem pro porovnání jednotlivých verzí je relativní 
chyba odhadu posledního odečtu všech zákazníků, tj. 


kde 




( 10 ) 


lokalit proto zvolíme jednu z dostupných verzí. S ohle¬ 
dem na nejlepší výsledky na kmenových datech 
použijeme regionální verzi modelu (2.4.1.2). 

Porovnání podle kritéria (6) poskytuje tabulka 2. 
Kritérium (5) dává stejné uspořádání, proto je 
z úsporných důvodů vynecháno. Porovnání podle 
zbylých kritérií uvedených v odstavci 4.1 je uvedeno 
v tabulce 3. 

Tabulka 3: Porovnání testovaných lokalit podle kritérií (7), 
(8) a (9). 


Yi je poslední měřená spotřeba zákazníka i, 

Yi je odhad spotřeby zákazníka i modelem GAMMA za 
období odpovídající spotřebě Yi. 

Vypočtené hodnoty kritéria (10) pro jednotlivé testo¬ 
vané verze jsou uvedeny v tabulce 1. Ukazuje se dílčí 
zlepšení v jednotlivých po sobě jdoucích verzích. 


Tabulka 1: Porovnání testovaných verzí modelu GAMMA 
na kmenových datech podle kritéria (10). 


Verze 

2.4.0 

2.4.1.0 

2.4.1.1 

2.4.1.2 

Přesnost [%] 

97,67 

98,15 

98,53 

98,69 


K otestování vlivu verze modelu na porovnání lokalit lze 
využít například kritérium (6), jehož hodnoty jsou uve¬ 
deny v tabulce 2. Ač pozorujeme drobné (ve srovnání 
s celkovou hodnotou) rozdíly v hodnotách kritéria (6), 
výsledné pořadí lokalit se nemění. Zdá se tedy, že vliv 
verze modelu bude přinejmenším v první fázi vyhodno¬ 
cování zanedbatelný. 

Tabulka 2: Porovnání testovaných verzí modelu 
GAMMA na dostupných uzavřených 
lokalitách - kritérium (6). 


Lokalita 

2.4.0 

verze r 

2.4.1.0 

nodelu 

2.4.1.1 

2.4.1.2 

8 

128,25 

127,68 

129,91 

131,00 

7 

78,61 

75,32 

76,10 

77,35 

9 

44.29 

41.70 

40.79 

39.56 

4 

29,41 

27,05 

27,32 

27,32 

5 

25,74 

23,35 

23,04 

22,85 

1 

24,01 

21,98 

21,85 

21,81 

6 

10,35 

9,82 

9,75 

9,66 

3 

9,70 

9,49 

9,36 

9,13 

2 

5,19 

5,02 

4,89 

4,74 


4.3. Vybodnocení 

Jak bylo ukázáno v předchozím odstavci, volba verze 
modelu má na vyhodnocení ztrát v jednotlivých loka¬ 
litách zanedbatelný vliv. Pro vyhodnocení dostupných 


L. 

Err^(l) 

L. 

Err^{l) 

L. 

Srrs)/) 

7 

48,730 

8 

3435,258 

2 

95,322 

8 

28,115 

7 

1268,978 

3 

49,797 

3 

25,838 

9 

479,103 

7 

43,227 

2 

23,602 

2 

59,767 

6 

27,201 

1 

21,886 

3 

327,209 

8 

27,104 

4 

20,512 

4 

312,672 

1 

16,246 

6 

16,505 

1 

307,173 

4 

11,708 

9 

16,209 

6 

293,452 

9 

9,710 

5 

12,386 

5 

0,000 

5 

0,000 


Při pohledu na pořadí podle prvních tří kritérií figu¬ 
rují v popředí (jakožto nejhorší) lokality 8 a 7. V loka¬ 
litě 8 je však pozorováno podezřelé chování vstupních 
hodnot, jak ukazuje obrázek 2. Zdá se, že cca od října 
2008 do října 2009 vypadlo měření na jednom nebo 
více ze čtyř vstupních bodů do lokality. V této loka¬ 
litě je proto zapotřebí prověřit kvalitu vstupních měřidel. 

V lokalitě 7 lze pozorovat obdobný problém (obrázek 
vynechán z úsporných důvodů). Vynecháme-li tyto lo¬ 
kality, lze věnovat zvýšenou pozornost lokalitám 4 a 5 
podle kritéria (6), případně 2 a 3 podle dalších kritérií. 

V současné době bylo na základě prezentovaných 
výsledků rozhodnuto o osazení lokalit 7 a 4 průběhovým 
měřením ve všech vstupních i výstupních bodech a k re¬ 
vizi průběhového měření vstupu v lokalitě 8. 



Datum 

Obrázek 2: Porovnání časového průbéhu vstupu a odhadu 
celkové spotřeby modelem GAMMA v lokalitě 
8 za dostupné časové období. 
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5. Závěr 

Byla uvedena problematika přepravy, měření a odhadu 
spotřeby zemního plynu ve světě a (zejměna) v ČR. Dále 
byl prezentován projekt, v rámci něhož je ve spolupráci 
pracovních skupin Úl AV ČR, v.v.i., a RWE GasNet, 
s.r.o., řešena problematika detekce zvýšených ztrát v 
uzavřených lokalitách. 

Projekt je v počáteční fázi, kdy probíhá vývoj meto¬ 
diky vyhodnocování odhadů ztrát. Jsou testována různá 
kritéria hodnocení ztrátových procesů a na základě 
nich vytipovávány lokality s největšími problémy. Ty 
mohou být způsobeny jak chybami v předávaných 
údajích (například výpadky měření na vstupu, chyby 
v průběhových měřeních spotřeb velkých zákazníků 
apod.), tak skutečnými zvýšenými ztrátami. Také pocho¬ 
pitelně mohou hrát roli další zatím nepředvídané vlivy. 

Uvedení do praxe (rutinní zpracování stovek až tisíců 
uzavřených lokalit v rámci celé distribuční sítě) 
musí kromě masivního čištění zákaznické databáze 
předcházet ještě minimálně otestování na měřených 
údajích o ztrátách, což obnáší osazení vybraných lo¬ 
kalit průběhovým měřením (všechny vstupy i odběrná 
místa). Tyto lokality budou vybrány na základě výsledků 
současné fáze projektu. 

Poděkování 

Autor studie děkuje svému školiteli Emilu Pelikánovi za 
revizi textu a cenné věcné i formální připomínky, dále 
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Abstract 

This páper deats with an approximate in¬ 
verse preconditioning and solving linear alge¬ 
braic equation Systems Ax = b with symmet- 
ric and positive deflnite n x n matrix A ha- 
ving a sparse pattern. The preconditioning is 
based on incomplete decomposition using the 
Gram-Schmidt process with the non-standard in- 
ner product induced hy the matrix A. The in- 
completeness is achieved through dropping en- 
tries which are smáli absolutely, or relatively 
with respect to other computed quantities. The 
main goal of the páper is to show a connection 
among dropping in the described incomplete de¬ 
composition, loss of A-orthogonality and con- 
vergence of the iterative solver which is the con- 
jugate gradient method. The results for a real- 
world problém are accompanied by the derived 
bounds for the loss of A-orthogonality bounds. 

1. Introduction 

Solving Systems of linear algebraic equations forms a 
crucial part of many problems of scientific computing. 
There are two basic solving approaches - using a direct 
or an iterative solver. By direct solver we mean a generál 
class of approaches based on the Gaussian elimination. 
The direct solver may often change the matrix sparsity 
pattern and this can result in adding new nonzero ent- 
ries, which are called fill-in. The amount of the fill-in 
and of the resulting memory requirements may be decre- 
ased by sophisticated reorderings of the originál matrix, 
see, e.g., [12]. Direct solvers are typically very robust, 
they often provide rather accurate solution, but they may 
be expensive. Very often only a rough approximation of 
the solution is needed. This is the reason, why an itera¬ 


tive solver may be a method of choice. The term iterative 
solver includes wide class of methods, which converge 
to the solution in some precisely defined sense (residual 
minimization, energy norm of error minimization, etc.). 
Some generál iterative solvers converge to the solution 
in infinite number of steps, but there are several methods 
which converges in exact arithmetic in at most n-steps 
(eg. conjugate gradient (CG) method, generalized mi- 
nimal residual (GMRES) method, etc.). As mentioned 
above, only a solution approximation can be hopefully 
obtained in a few iterations is often seeked. The conju¬ 
gate gradient method minimizes energy norm of error. It 
is probably the most frequently ušed iterative method for 
solving linear systems of equations with SPD matrices. 
We will focus on the linear systems, which arise from 
the finite element method. The matrix A and the right 
hand side b involve error, which may háve the following 
main sources: 

• Chosen partial differential equations do not de- 
scribe given reality exactly, 

• exact materiál parameters are not precisely 
known, 

• error from model linearization, 

• discretization error. 

The sources of error induce perturbations AA and A6, 
so that the solution of the perturbed systém 

{A + AA){x + Ax) = b + Ab (1) 

is a: + Ax where Ax can be often estimated based on the 
initial systém perturbations. Consequently, it may not be 
necessary to find the exact solution of the linear systém 
(1). Instead, a solution with smáli residual may be fine 
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and this can be easier to find. As a direct solver in gene¬ 
rál needs suitable matrix reordering and other technical 
enhancements, an iterative solver needs for a success- 
ful convergence apart from other tools a transformation 
called preconditioning. In our čase and for SPD matrices 
we can see a preconditioner as an aproximation of the 
inverse matrix, which improves its spectral properties. It 
is a well known fact, that the eigenvalues determine the 
CG convergence. 

Section 2 presents an introduction to the preconditi¬ 
oning, simple OverView of preconditioning techniques 
which then focuses to the generalized Gram-Schmidt. 
Section 3 is dedicated to the conjugate gradient method 
in generál. Finally Section 4 contains some experimen- 
tal results for a test problém. Section 5 concludes this 
contribution mentioning also the future work. 

2. Preconditioning 

As it was mentioned above, a preconditioner approxima- 
tes the inverse of the matrix A. In practice we distinguish 
preconditioning from the left and from the right. Assume 
that P is a preconditioner. The preconditioning transfor¬ 
mation of a given linear systém Ax = b from the left 
can be put down as follows 

PAx = Pb. (2) 

Similarly, systém preconditioned from the right can be 
written as 

APy = b, y = P~^x. (3) 

In both cases this notation is just symbolic, since the 
reál implementation may differ. Formally, for CG there 
is a problém because the products PA and AP do not 
preserve symmetry of the originál matrix A and in or- 
der to preserve it we háve to combine both approaches. 
If we assume the preconditioner in the factorized form 
P = ZZ'^ we get both-sided preconditioned linear sys¬ 
tém in the form 

z"’' AZy = Z^^^b, y = Z~^x (4) 

which may be obtained implicitly in the implemen¬ 
tation. A successful preconditioner has to satisfy the two 
following requirements 

• nnz(Z) (number of nonzeros in Z) háve to be 
smáli 

• the norm \\Z'^AZ — /|| háve to be smáli for a gi¬ 
ven sparsity pattern of Z 

The hrst assumption is connected to the internal compu- 
tations in iterative methods. Smáli nnz(A) and nnz(Z) 


do not necessarily imply smáli nnz(Z^2lZ), but since 
we may store these quantities implicitly, the fast matrix- 
vector multiplications (matvecs) repeatedly applied in 
iterative methods guarantee fast computation and smáli 
number of flops. The second assumption is connected to 
the spectral properties of the preconditioned matrix im- 
plying fast convergence of the iterative method. 

2.1. Preconditioning techniques 

There is a lot of ways to compute a simple preconditi¬ 
oner (see, e.g., [2]), which may also také into account 
parallel computing environment. Here we will deal with 
explicit preconditioners which approximate the inverse 
problém P = ZZ"^ « A~^. An interesting example 
is the SPAI approach [4] which minimizes the functi- 
onal ||/ — Pj4||í? and may be reduced to decomposi- 
tion to n much simpler problems. In addition, this ap¬ 
proach is naturally parallel. For a generál survey of pre¬ 
conditioning techniques see also [10]. Briefly, for SPD 
matrices we may use the incomplete Cholesky factori- 
zation (IC), see its explanation in [9], modified incom¬ 
plete Cholesky factorization (MIC), incomplete Cho¬ 
lesky treshold (ICT) [1] , approximate inverse (AINV) 
[3] and SAINV (stabilized AINV) [5] which are based 
on the generalized Gram-Schmidt process. AINV uses 
specific orthogonalization between classical and modi¬ 
fied Gram-Schmidt process and SAINV uses modified 
Gram-Schmidt process. The last two algorithms will be 
described in this contribution. An important subclass 
of incomplete decompositions is based on prescribing 
more sophisticated pattern of the nonzero entries. This 
type of proceduře provides level based preconditioners. 

2.2. Generalized Gram-Schmidt based preconditio¬ 
ners 

Generalized Gram-Schmidt algorithm assumes SPD 
matrix A and initial basis of the column vectors (li- 
nearly independent) - matrix Z^^\ which will be A- 
orthogonalized against previously computed vectors. 
Algorithm produces matrices Z and [/, where the ma¬ 
trix [/ is in an upper triangular form. In exact arithmetic 
the computed matrices satisfy the following identities 

• Z'^AZ = I, 

• = ZP, 

• = U^U. 

It is clear that for an upper triangular Z^'^^ we get 
the matrix Z as the inverse upper triangular fac- 
tor of A (unique). In addition for Z^'^^ = I we 

get matrix U as the Cholesky factor of the mat¬ 
rix A. The basic algorithm has a lot of variants eg. 
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classical - CGS/modified - MGS/AINV orthogona- 
lization, right-looking/left-looking, with/without pivo- 
ting, with/without iterative refinement and their arbit- 
rary combinations. For distinction with exact arithme- 
tic counterparts, we denote quantities computed in finite 
precision arithmetic using an extra upper-bar. Main ver- 
sions of the algorithms can be put down as follows 

MGS algorithm 


(1) for fc = 1 : n 

(2) for j = 1 : fc — 1 

(3) U,k-.= {žt"\ž,)A 

(4) 2^^ := 4'’ - Ujkžj 

(5) end for 

( 6 ) 

(7) ž, := 

(8) end for 


AINV orthogonalization 


(1) for fc = 1 : n 

(2) for j = 1 : fc — 1 

(3) U,k-.= {žt^\zf)A/Un 

(4) žk^ := žk~^^ - Ujkžj 

(5) end for 

(6) 4-, :=((4"■'^4°V)'/^ 

(7) Zk := Zk/Ujj 

4) end for 


CGS algorithm 

(1) for fc = 1 : n 

(2) for j = 1 : fc — 1 

(3) Ú,k-.= {z^ů\žj)A 

(4) endfor 

(5) for j = 1 : fc — 1 

(6) 4^^ := Zk~^^ - Ujkžj 

(7) endfor 

(8) Í7,,:=((ž4-i),zf-i))^)V2 

( 9 ) Zk ■— Z^k/Ujj 

(10) endfor 


Notes: 

In the third row can see a significant difference among 
the algorithms - each of them uses different vectors to 
compute the inner product. In exact arithmetic this leads 
to the same results. To compute the diagonál entries IJjj 


one can use several ways. Among all given algorithms 
MGS leads to the best results in finite precision arithme¬ 
tic. For = I inner product in AINV algorithm redu- 
ces to the Euclidean inner product (consequently ortho- 
gonalized vector 2p~^^ row/column of the matrix 
A selected via Cj). CGS variant offers significantly bet- 
ter potential for parallel implementations. Initial vector 
basis in the form = I is the most prefered 
way for using this (incomplete) algorithms for compu- 
ting preconditioners. In finite precision arithmetic CGS 
and AINV behave similarly (they háve the same error 
bounds as we will show later). All these algorithms are 
breakdown free for well-conditioned problems. In ad- 
dition all of them can be modified to the square root 
free versions if we do not scale vectors and put 

— 1- 

A pioneering work in the analysis of the standard Gram- 
Schmidt algorithm can be found in [7] and in the recent 
work [6], but generalized Gram-Schmidt as an algorithm 
for computing preconditioner has not been analyzed yet. 
In [11] we provided the bounds for the described algori¬ 
thms as follows: 

CGS, AINV orthogonalization: 

" II - 1 _ C>(n5/2)íiK(A)K(Al/2Z(0))K(Z(0)) 


MGS: 

lir- 7Ta7\\ < 0 {n^^^)uK{A)K{A^/^Z(°'>) 

II " - 1 - 0{n5/^)uK{A)K{A^/'^ZW) ’ 

where 0{n) is a low degree polynomial in the pro¬ 
blém dimension and u is the corresponding unit roun- 
doff. For instance fioating point double precision has 
u « 1.1-10“^®. Quality of the computed symmetric pre¬ 
conditioner can be assessed via the norm \\Ž’^AŽ — I\\, 
which was already mentioned. This norm can be seen for 
Gram-Schmidt as the loss of A-orthogonality of the co- 
lumn vectors in the matrix Ž. Matrix Ž, which is com¬ 
puted by the complete algorithm does not fulfil condi- 
tion of the smáli nnz(Z). There is a way how to use 
the bounds in [11] and construct incomplete algorithms 
using on already analyzed full algorithms. Rounding 
error analysis is then based on the worst čase which can 
occur in finite precision arithmetics. For instance for ge¬ 
nerál atomic fioating point operation - op and two num- 
bers X and y in finite precision arithmetics we can write 
as follows [8]: 

|fl[a:opí/]| < |(a;opi/)|(l+ m), (5) 

where fl[-] means computation in the finite precision ari¬ 
thmetics. This rule can be ušed to the algorithms on the 
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level of vector SAXPY updates (line 4 for MGS and 
AINV and line 6 for CGS). Assume an update of the 

(7~1) 

vector z f ■ For every component it holds 

( 6 ) 

where is the rounding error obtained from (5) as 

5^) <u\zt"\+2u\U,,\\z,\. ( 7 ) 

Arithmetic with higher unit roundoff Unew can be simu- 
lated using the following generic rule: 

( 1 ) mask = > Unew{\zl^~^^\ +‘i\Uji\\Zj\) 

(2) = zp^ * mask 

Notě that mask represents a vector of the boolean va- 
lues. It is componentwise set if the result (zl 0 in double 
precision arithmetic is greater or equal than the simula- 
ted roundoff error. Second line shows dropping of com- 
ponents corresponding to components in mask which 
were not set. In this way we get the incomplete GS al- 
gorithms. 

One can extend the previous dropping approach to com- 
puting inner products. For example, the roundoff error 
in computing inner product in the MGS variant is given 
by 

(8) 

where using (5) we get 

AUjk < 2m^/\\\A\\\\zi^-^'>\\\\zj\\. (9) 

For the other algorithms, dropping on the level of the 
orthogonalization coefficients can be doně similarly. 

Reasonable limit in the Mnem, that we can use is given 
by the loss of orthogonality bounds. It is desirable to 
háve the loss of A-orthogonality \\Z'^AZ — /|| < 1 . For 
simplicity the numerator has to be < 1 and denominator 
> 0. For our loss of orthogonality both bounds lead to 
the same conditions. Low degree polynomial 
arise from the worst čase roundoff, for low dimensi- 
onal problems it behaves as « 1, how we 

can see in [ 11 ]. Substitution 0{m^/^)u by Unew and 
using assumption for the loss of A-orthogonality, one 
gets equation 

Uneu,K{Af^^ < 1 , ( 10 ) 

but this condition is very strict and does not provide a 
sparse preconditioner, as it will be shown later in text. 


Summarizing previous lineš, we introduced a dropping 
stratégy having the same behavior as the derived bounds 
for rounding errors. Notě that there exist a lot of drop¬ 
ping strategies for different preconditioners but most of 
them were not analyzed and it would be difficult to do 
so. 


3. Conjugate gradient method 

The conjugate gradient method (CG) is an iterative me¬ 
thod based on the Lanczos process which belongs to 
the class Krylov subspace methods. The solution appro- 
ximation Xk in the fc-th step of the algorithm satisfies 
Xk € /Cfc(A,ro), where /Cfc(A,ro) is the fc-th Krylov 
subspace generated by the matrix A and by the initial 
residual tq . CG can be seen as a proceduře to minimize 
the quadratic functional f(x) = ^x^Ax — x"^b with the 
gradient g of this functional equal to g = Ax — b (nega¬ 
tive residual vector). The standard CG algorithm can be 
put down follows 


(1) 

ro = b- Axo 

( 2 ) 

Po = ro 

( 3 ) 

for fc = 0 : n — 1 

( 4 ) 

“ plAp, 

( 5 ) 

^fc+1 — Xk OíkPk 

( 6 ) 

Tk+i =rk- akApk 

( 7 ) 

Bk = 

P^Apk 

( 8 ) 

Pk+1 = rk+1 — PkPk 

( 9 ) 

endfor 

CG preconditioned from both sides (4) can be after some 
substitution written as follows: 

(1) 

ro = 6 - Axq 

( 2 ) 

Po = ZZ'^ro 

( 3 ) 

for fc = 0 : n — 1 

( 4 ) 

„ _ Pl^k 

( 5 ) 

Xk-\-l — Xk OíkPk 

( 6 ) 

rk+i =rk- OkApk 

( 7 ) 

o _ plAZZ'^rk+1 
plApk 

( 8 ) 

Pk+i = ZZ'^rk+1 - PkPk 

( 9 ) 

endfor 

As mentioned above, if we do not compute the product 
Z^^AZ explicitly, it is not necessary to háve the precon- 

ditioner 

in the factorized form because P — ZZ^ ^ 


A~^. CG algorithm minimizes energy norm of the error, 
which is not known during the process. This fact does 
not enable a simple stopping criterion based on this va- 
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lue. Instead, stopping criteria are often based on the re- 
lative residual. Notě that it can occur that the residual 
vector norm may locally grow. 

4. Test problém 

As a test problém we chose the matrix BCSSTK07 with 
dimension n = 420 from MatrixMarket [13]. Condition 
number of this matrix « 1.2 • 10^. In this section we will 
discuss convergence of CG preconditioned by AINV 
and SAINV with respect to the loss of A-orthogonality 
of the column vectors of the factorized preconditioner. 
As the matrix of initial columns vectors for all cases in 
this section we use = I. All presented results are 
computed by using dropping on the level of vector up- 
dates (not in orthogonalization coefficients). 



Figuře 1: Loss of orthogonality for the AINV algorithm. 



Figuře 2: Convergence for CG preconditioned by AINV. 

Convergence of CG is similar (Figures 2 and 4) until 
drop tolerance is equal to 1 • 10“"^. Differences can be 
found for the drop tolerances higher than 1 • 10“®. In 
both cases the A-orthogonality is completely lost as we 
can see in the corresponding Figures (1) and (3). 



Step n 


Figuře 3: Loss of orthogonality for the SAINV algorithm. 



Figuře 4: Convergence for CG preconditioned by SAINV. 

Whereas SAINV converges for drop tolerance 1 • 10“^ 
in about 50 steps, AINV needs about 200 steps for the 
corresponding nnz(Ž), see Table (1). 


matrix bcsstk07 

^new 

AINV nnz(Z) 

SAINV nnz(Z) 

0.0 

82722 

82722 

1.0- 10““ 

82648 

82648 

1.0- 10“® 

82646 

82646 

1.0-10“® 

82626 

82626 

1.0- 10-'^ 

82513 

82512 

1 

O 

r—H 

O 

81773 

81752 

1 . 0 - 10-5 

78573 

78418 

1 

O 

r—H 

O 

68843 

68391 

1.0- 10-5 

52730 

44628 

(M 

1 

O 

r—H 

O 

32173 

16559 

1 

o 

r—H 

O 

7924 

5648 


Table 1 : Number of nonzeros in the factor Z for given Unew 
computed by AINV and SAINV. 
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We get the matrix Ž by columns, so that in the z-th step 
we háve the matrix Ži with i columns. For given drop 
tolerances we try to show how to eigenvalues of the ma¬ 
trix ZiŽf evolves and converges to the eigenvalues of 
the matrix A~^ during the computational process. Fi- 
nally we will show spectral properties of the matrix A~^ 
with comparison of the spectral properties of the compu- 
ted matrix ŽŽ'^, which is computed using incomplete 
Gram-Schmidt algorithms. 



Figuře 5: Evolution of the eigenvalues for the AINV algori- 
thm. 

Dotted horizontál lineš show eigenvalues of the inverse 
operátor A~^, by x-marks are displayed eigenvalues of 
the matrices ŽiŽJ. Dropping causes cancelation of the 
smallest eigenvalues of the matrix A, therefore spectrum 
of the approximate inverse does not involve the lar- 
gest eigenvalues when compared with the spectrum of 
the exact inverse as we can see in Figures (5) and (6). 
Although the spectra of the preconditioners seem to 
be similar, results of SAINV are much better and we 
currently do not know why. 
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5. Conclusion 

In this páper we háve summarized derived bounds for 
the gereralized Gram-Schmidt algorithm analysis and 
proposed a new dropping stratégy which is based on the 
problém analysis (local bounds) further, we tried to find 
a connection among loss of A-orthogonality, dropping 
and convergence of CG. As it was shown, the quality of 
the preconditioner is not given only by the loss of A- 
orthogonality because preconditioner computed by SA¬ 
INV provides much better results than AINV also for 
the cases when A-orthogonality among column vectors 
in the factor Ž is completely lost, although having for 
similar nnz(Ž). Presented dropping stratégy (dropping 
on the level of vector updates) does not allow control- 
ling nnz(Z) in the continuous way. This work has not 
been finished yet, it only shows the most recent result 
and outlines the future direction of our work. 
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Abstract 

This presentation introduces an approximation heuristic suitable for solving a certain class of partially observable 
Markov decision processes (POMDPs) that has been developed by the author within his master’s thesis. The POMDP 
framework is generic enough to represent any reál world stochastic process, however, an exact solution is compu- 
tationally tractable for only the simplest models. The presented heuristic is based on decomposing the process to 
non-disjoint subprocesses, each of which is signiflcantly dependent on only a limited number of other subprocesses, 
thus reducing the superexponential nátuře of the generic problém at a cost of ignoring some weaker dependencies 
within the stochastic process. Although the originál idea is applicable to many POMDP problems and solution algo- 
rithms, an example application and implementation is presented including some test results. 
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Abstrakt 

Cílem výzkumné práce je příprava deter¬ 
ministických optimalizačních metod pro řízení 
integrační architektury informačních systémů 
(IS) ve zdravotnictví. Připravovaná metodika 
má poskytnout aparát ke stmkturovanému vy¬ 
hodnocení a porovnání dílčích návrhů inte¬ 
grace IS. Očekávaným přínosem je snížení cel¬ 
kových nákladů (TCO - Total Cost of Owner- 
ship) a zvýšení flexibility prostředí (TTM - Time 
to Markét). Při aplikaci a kombinaci vzorů a 
osvědčených řešení musí architekt vždy vzít v 
úvahu informace o prostředí, ve kterém se in¬ 
tegrace buduje. Právě objektivizace hodnocení 
prostředí by měla úkol zjednodušit a její příklad 
je předmětem tohoto ělánku. 

1. Komunikace 

Pro účely této práce definujme 2 základní typy komuni¬ 
kace, které budou dále hlouběji strukturovány: 

• lokální komunikace je definována jako řízená 
výměna dat mezi 2 programy spuštěnými na 
stejném HW. 

• stiová komunikace je definována jako řízená 
výměna dat mezi 2 programy běžícími na různých 
strojích propojených pomocí počítačové sítě. 

Motivací k prezentovanému výzkumu je i fakt, že lokální 
komunikace se v mnohém liší od komunikace přes síť. 
Toto prosté tvrzení má poměrně komplexní množinu 
příčin a důsledků. Jak bude ukázáno, zanedbáváním 
uvedeného tvrzení a jeho důsledků dochází k vytváření 
nevhodných integračních řešení. Některé technické stan¬ 
dardy pro integraci přes síť byly vyvinuty evolucí ze 
standardů pro synchronní lokální komunikaci. Při ex¬ 
trapolaci těchto standardů do oblasti integrace přes síť 


lze téměř jistě říct, že návrh distribuovaněho systému 
stejným způsobem jako lokálního bude mít katastrofální 
následky. 

Přehled základních rozdílů mezi lokální a síťovou ko¬ 
munikací je uveden v tabulce 1. Nyní se podívejme na 
jednotlivé aspekty podrobněji: 

Spolehlivost komunikační infrastruktury 

Budeme-li uvažovat o spolehlivosti, měli bychom vzít 
v úvahu parametry jako stav HW, množství kon¬ 
kurence procesů v daněm prostředí, topografickou 
vzdálenost mezi komunikujícími procesy apod. V 
lokálním prostředí se nejčastěji pohybujeme na úrovni 
jednoho operačního systému (OS) nebo middleware, 
případně na úrovni společného HW, na němž jsou 
pomocí virtualizační technologie definovány virtuální 
stroje. Zásadním faktem je existence společněho HW. 
Spolehlivost HW je zde téměř binární (stroj buď běží 
nebo ne, případné chyby se sdílí), na rozdíl od síťového 
spojení, kde za prvé procesy běží odděleně a za druhě 
na komunikaci se podílí řada dalších aktivních prvků. 
Detekce chybových stavů a alerting je řádově jed¬ 
nodušší lokálně než v síti. Takě množství konkurenčně 
běžících procesů je typicky menší než při komunikaci 
přes obecně sdílenou síťovou infrastrukturu. 

Rychlost komunikace 

V lokálním prostředí také prakticky zanedbáváme 
vzdálenost komunikujících procesů, protože sdílejí 
stejný HW a jejich komunikace je omezena šířkou a 
taktem sběrnice. U dnešního hardware se pohybujeme v 
řádech desítek a stovek Gbps. Naopak při síťové komu¬ 
nikaci se v reálných hodnotách pohybujeme o řády níže, 
nejčastěji ve stovkách kbps, maximálně v jednotkách 
Mbps. Mluvíme zde o reálné komunikaci mezi procesy 
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na aplikační vrstvě (tzv. End-to-End), nikoli o rychlosti 
přenosových sítí, které mohou být mnohem vyšší. 

Technologická diverzita 

V návrzích integrace mezi systémy je potřeba počítat 
s rozdílností operačních systémů, použitých programo¬ 
vacích jazyků, middleware platforem, kódování nebo 
obecně formátů dat. U síťových řešení je třeba se 
na všechny uvedené potenciální problémy připravit. 
Dnes existuje řada platformově nezávislých standardů 
založených na jazyce XML [10], jejich využití však má 
svá omezení. Jednak nejsou implementovány ve všech 
technologiích a dále pak nemusí být použitelné ve všech 
případech, především z výkonnostních důvodů. 

Rozdílný režim správy 

Se týká především organizačního zajištění integro¬ 
vaných systémů a integrační technologie (včetně sítě). 
Je zřejmé, že 2 libovolné IS integrované přes síť mo¬ 
hou patřit a často patří rozdílným společnostem. Různé 
společnosti se vyznačují různou mírou penetrace ICT 
do jejich prostředí, včetně rozdílného množství orga¬ 
nizačních procesů, které IT podporuje. Tato míra spolu 
s velikostí společnosti indikuje 2 druhy rozdílů v IT. 

Za prvé je to využití odlišných technologií. Např. sa¬ 
mostatně pracující praktický lékař si jistě nebude kupo¬ 
vat 64-jádrový server vyžadující 2kW chlazení a optický 
spoj s WDM. Naopak velká fakultní nemocnice si z 
nemůže dovolit budovat svou serverovou infrastrukturu 
na stolních PC, ani pro přístup do sítě používat ASDL 
nebo 802.1 Ig spojení. Rozdílná úroveň používaných 
technologií pak implikuje i odlišné možnosti v propo¬ 
jení společnosti s externími subjekty. Tak vzniká riziko, 
že se nepodaří nalézt způsob vyhovující oběma stranám. 

Za druhé existuje odlišnost v režimu správy resp. per¬ 
sonálním zajištění. Větší společnosti mají obecně vyšší 
dostupnost své ICT infrastruktury a také mohou vyka¬ 
zovat lepší kvalitu svých ICT služeb. Naopak flexibilita 
při změnách bývá u větších společností výrazně menší. 

Tato práce není zaměřena na organizační zajištění pro¬ 
vozu a rozvoje integrovaných systémů. Více informací 
o problematice řízená architektury lze nalézt např. v [2]. 


2. Způsoby integrace 

Dnes rozlišujeme 4 základní koncepty integrace [1], 
které se s různou úspěšností vyrovnávají s výše uve¬ 
denými aspekty integrace: 


Batch Filé Transfer (BFT) 

Dávková výměna souborů je nejjednodušším způsobem 
komunikace dat. Zdrojový systém vytvoří soubor obsa¬ 
hující řídící příkazy resp. data a uloží ho na persistentní 
úložiště (např. diskové pole). Soubor je interaktivně 
nebo automaticky přenesen k cílovému systému, kde ho 
cílový systém načte. Samozřejmě lze realizovat jedno- i 
obousměrnou komunikaci. Kromě obsahu dat je nutné 
dohodnout a respektovat řadu obslužných parametrů 
jako jména souborů, čas výměny, umístění, mazání sou¬ 
borů i mechanismy obsluhy chybových stavů. 

Shared Database (SDB) 

Reprezentuje archetyp dvou a více systémů sdílejících 
v reálném čase jedno datové úložiště. V praxi se jedná 
nejčastěji o společnou databázi (DB), ale stejně tak 
lze za příslušných podmínek využít paměť nebo dis¬ 
kový prostor. Nemusí se jednat o integraci s výměnou 
dat, protože integrované systémy sdílí fyzicky jedno 
úložiště. 

Remote Proceduře Call (RPC) 

Představuje model, ve kterém jeden systém nabízí 
určitou svou funkci přes síťové rozhraní a jiný systém 
ji volá. Vznikl evolucí lokálního volání procedur / 
funkcí v rámci jednoho stroje resp. systému na volání v 
rámci počítačové sítě. Extrapoluje koncept synchronní 
blokující operace z prostředí jednoho počítače na in¬ 
frastrukturu separátních strojů propojených v síti a 
přináší problémy uvedené v úvodu tohoto článku. Jedná 
se o synchronní blokující komunikaci v reálném čase. 

Messaging (MS) 

Využívá dedikovaného software k doručování zpráv. 
Odesílatel předává zprávu MS a sám může pokračovat 
ve své další činnosti. MS je zodpovědný za doručení 
zprávy. Tato komunikace je asynchronní a neblokující. 
Koncept asynchronní komunikace vznikl právě v reakci 
na nemožnost přistupovat k systémům propojeným sítí 
stejně jako k systémům lokálním. Bohužel, dodnes není 
na mnoha místech všeobecně přijat do praxe. Při nárůstu 
objemu a počtu komunikací nad udržitelnou mez není již 
vhodné programovat messaging ručně. Dnes se k uve¬ 
denému úéelu používá téměř výhradně některý z pro¬ 
duktů kategorie Message Oriented Middleware (MOM) 
[1]. MOM pak tvoří transportní základ v modelu Enter¬ 
prise Service Bus (ESB) [5]. Rozbor ESB je mimo roz¬ 
sah možností tohoto článku. 

Uvedené 4 způsoby nelze brát jako různé evoluční 
úrovně. Mezi zmíněnými kategoriemi neexistuje žádné 
uspořádání, které by vypovídalo o kvalitě toho kterého 


PhD Conference ’ 11 


82 


ICS Prague 



Daniel Krsička 


Objektivizace charakteristik integrace IS ve zdravotnictví 


způsobu samo o sobě. Základní určení způsobu inte¬ 
grace (ještě bez vazby na jakěkoli další vzory či ověřené 
postupy) musí být dáno vždy aktuálními podmínkami 
tj. jednak vyspělostí ICT infrastruktury integrovaných 
společností a dále především požadavky na konkrétní 
výměnu dat. Generalizace na jeden způsob je vhodná 
jen od určité úrovně komplexnosti a nikdy nemůže být 
úplná (tj. dogmaticky uplatňovaná). 

2.1. Časový rozměr komunikace 

Zásadní informací u výše uvedených základních 
způsobu integrace je určení, zda je komunikace syn¬ 
chronní nebo asynchronní (viz tabulku 2). Magnam par¬ 
tem se zajímáme o integraci systémů přes počítačovou 
síf. Dokonalé rozdělení není možné, protože komu¬ 
nikace probíhá na více vrstvách. Některé používají 
blokující operace a jsou tedy synchronní, některé ne. 
Rozdělení je tedy nutné stanovit dohodou vycházející z 
následujících předpokladů: 

• Dělení na blokující a neblokující operace defi¬ 
nujme na aplikační vrstvě tj. na vrstvě volajícího 
a přijímajícího procesu '. 

• Předpokládáme, že subsystémy realizující nižší 
vrstvy umožňují minimálně kvazi-paralelní zpra¬ 
cování více požadavků. Tzn., že žádný aplikační 
proces není na významně dlouhou dobu odstaven 
od prostředků. 

Všechny uvedené způsoby lze využít k získávání infor¬ 
mací on-demand (request/response) i k proaktivní publi¬ 
kaci informací (one-way). Způsob integrace tedy neim- 
plikuje směr komunikace ani typ vyměňovaných infor¬ 
mací. 

3. Vzdálenost integrovaných systémů 

Abychom mohli v analýze integračního scénáře zohled¬ 
nit topografické rozdělení jednotlivých IS, je nezbytné 
zavést pojem vzdálenosti mezi integrovanými systémy. 
Vzdálenost musí být ordinální veliěina s ostrým úplným 
uspořádáním, tj. na oboru hodnot musí být definována 
binární antireflexní, tranzitivní, antisymetrická relace 
uspořádání [16]. Pro objektivní hodnocení je také třeba 
definovat všechny hodnoty tak, aby bylo možné jednot¬ 
livé scénáře mezi sebou porovnávat. Pro účely této práce 
navrhujeme následující kategorie A — F : 

A Komunikace mezi procesy jednoho OS pomocí 
sdílené paměti (shared memory). 


B Komunikace mezi procesy jednoho OS pomocí 
lokálního sílového rozhraní (loopback) resp. mezi 
2 virtuálními stroji na stejném HW (virtual ne¬ 
twork). 

C Komunikace v LAN/SAN na přepínané síti včetně 
L3+ switchingu tj. řadíme sem komunikaci sítí, 
ve které jsou pouze aktivní prvky s výpočtem nad 
asociativní paměti (CAM). 

D Komunikace v LAN/SAN na směrované síti tj. 
přes aktivní prvky pracující s CPU výpočtem. Pro 
zařazení do této kategorie je významná inspekce 
paketů tj. řadíme sem i firewally a IPS systémy 
kontrolující hlavičky protokolů vyšších vrstev. 

E Komunikace dedikovanou (pronajatou) WAN - 
nad rámec uvedeného v kategorii D přibývá trans¬ 
portní zpoždění na síti, omezení pásma a prodleva 
konverze protokolů. 

F Komunikace přes Internet tj. WAN spojení bez 
záruky dostupnosti a bez možnosti aplikace QoS. 

Každá z uvedených skupin je charakterizována 
odlišným přenosovým pásmem, dostupným pro komu¬ 
nikující procesy. Například komunikace 2 aplikací přes 
sdílenou operační paměť jistě poběží rychleji než při 
komunikaci v rámci počítačové sítě společnosti, nebo 
dokonce při volání Internetem třeba z České Republiky 
do Austrálie. 

Výše uvedené kategorie lze hrubě charakterizovat mi¬ 
nimálně 3 atributy: dostupným přenosovým pásmem /, 
transportním zpožděním na síti t a mírou vyjadřující 
počet paralelních/kolidujících přenosů na síti a. Na 
úrovni představovaného modelu neuvažujeme o ko¬ 
lizních přenosech ve smyslu sdílení jednoho lokálního 
síťového pásma, ale jako o pravděpodobnosti, že v 
daném časovém úseku nebude přenos dat nijak výrazně 
narušen využitím stejné přenosové infrastruktury jiným 
komunikačním procesem. Infrastrukturou je zde celá ko¬ 
munikační cesta ve všech svých vrstvách a v celé své 
délce. Ze stejného důvodu nelze definovat transportní 
zpoždění jako veličinu nepřímo úměrnou dostupnému 
pásmu, protože, především u komunikací s „delší ces- 
tou“ tj. s přenosem přes rozsáhlé sítě, je linearita relace 
mezi pásmem a latencí narušena využitím řady aktivních 
prvků. 

Přenosové pásmo / lze vyjádřit v různých jed¬ 
notkách, pro naše účely volíme Hz a zanedbáváme 


He třeba dodat, zeje možně a často vhodně realizovat synchronní scěnář typu požadavek/odpověď pomocí MOM. Messaging implikuje asyn- 
chronní mechanismy na vrstvách nižších než je aplikační 
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tak rozdílná přenosová kódování. Naopak níže uve¬ 
dené hodnoty frekvencí jsou vždy vztaženy k šířce do¬ 
stupné sběrnice (např. násobení základní frekvence bito¬ 
vou šířkou sběrnice u lokální komunikace apod.). Trans¬ 
portní zpoždění í je druhým parametrem, vyjádřeným v 
sekundách. Představuje latenci mezi odesláním zprávy 
odesílatelem (aplikací/systémem) a jejím přijetím na 
aplikační vrstvě u příjemce. Jedná se tedy o čas zpoždění 
mezi 2 procesy. Posledním parametrem je míra dostup¬ 
nosti přenosového pásma a, který souvisí se spolehli¬ 
vostí komunikační infrastruktury diskutované v úvodu 
tohoto článku. 

3.1. Výpočet vzdálenosti 

Shora uvedeným veličinám jsme přiřadili hodnoty. 
Řádové pásmo / a zpoždění í bylo stanoveno na základě 
parametrů dnes používaných technologií. Míru dostup¬ 
nosti a stanovíme dohodou jako reálné číslo z inter¬ 
valu (0; 1), kde hodnota 1 vyjadřuje 100%-ní rezervaci 
pásma. Vzdálenost d mezi integrovanými systémy lze 
tedy vypočítat jako 1: 


zace tedy obecně vypadá následovně 2: 


d 


norm 


\d 


^ “1“ l^mmj 

min I + M max 


( 2 ) 


Pak všechny hodnoty dnorm pro vstupy určené katego¬ 
riemi A - F 3 padnou do intervalu (0; 1) a pro mezní 
hodnoty platí d™™ = 0 a Co™ = 1- 

3.3. Určení hodnot pro různá prostředí 

Následuje výpočet normalizované vzdálenosti pro jed¬ 
notlivé kategorie vzdáleností. Protože ICT infrastruk¬ 
tura se liší subjekt od subjektu v závislosti na jeho veli¬ 
kosti, počtu procesů podporovaných IT, počtu uživatelů, 
množství spravovaných dat, počtu integrovaných part¬ 
nerů, míře legislativní regulace, geografické lokaci 
a dalších parametrech. Proto jsme příklad výpočtu 
rozdělili na 3 samostatné jednotky. Každá jednotka cha¬ 
rakterizuje společnost o specifické velikosti. 

SOHO (Smáli Office Home Office) 


d = log 



( 1 ) 


Pro výpočet použijeme převrácenou hodnotu / tak, 
abychom s narůstající šířkou přenosového pásma kle¬ 
sala hodnota vzdálenosti. Naopak transportní zpoždění 
t připočítáváme v lineárním smyslu. Mírou dostupnosti 
o výsledný výraz dělíme kvůli oboru hodnot, který je 
pro a definován (0; 1). 

Po provedení výpočtů pro skutečné hodnoty frekvencí 
— +í 

a latence výrazu získáváme hodnoty, které se v 
krajních případech vzájemně liší o 8 řádů. Další počítání 
s takovými hodnotami je nepraktické, proto je vhodné 
provést transformaci pomocí logaritmu. ^ 

3.2. Normalizace vzdálenosti 

Pro praktické užití by bylo vhodné reálná čísla, která 
jsou výsledkem výpočtu vzdálenosti, transformovat na 
určitou normalizovanou stupnici. Provedeme norma¬ 
lizaci vzdálenosti do intervalu (0; 1). Pro provedení 
normalizace musíme určit mezní hodnoty vypočtené 
vzdálenosti odpovídající teoretické hodnotě 0 resp. 1. 
Hodnoty mezního výpočtu jsou uvedeny v 3. 

Výsledná normalizace se pak provede transpozicí do 
+ {0} a jeho projekcí na interval (0; 1). Normali¬ 


Malé subjekty, s jednotkami až desítkami uživatelů. 
Příkladem mohou být ambulance praktických lékařů, 
lékárny apod. Předpokládá se využití low-end zařízení 
pro síťovou komunikaci, plochá struktura místní 
sítě, běžné širokopásmové připojení do Internetu. 
Neočekáváme pronájem WAN spojů, ani dedikované 
serverové infrastruktury se speciálním serverovým HW. 
Ohodnocení vstupních veličin i vypočtené normované 
vzdálenosti lze nalézt v tabulce 4. 

Mainstream 

Společnosti střední velikosti s desítkami až stovkami 
zaměstnanců. Může se jednat o místní nemocnice, po¬ 
likliniky, drobné výzkumné ústavy, menší pojišťovny, 
zdravotnické registry. V mainstreamu předpokládáme 
centralizaci dedikovaných serverů do výpočetních sálů, 
hierarchizaci přepínaných sítí, možnou existenci pro¬ 
najatých linek s partnerskými společnostmi. Je možný 
výskyt sytémů IPS a aplikačních firewallů. Ohodnocení 
vstupních veličin i vypočtené normované vzdálenosti lze 
nalézt v tabulce 5. 

Enterprise 

Velké společnosti. Stovky až tisíce uživatelů - fakultní 
nemocnice, centrální registry, velké pojišťovny, orgány 
státní správy etc. Očekáváme high-end výpočetní 
prostředky, specializované SAN sítě, optické spoje, de- 


^Byla testována i varianta s logaritmem odmocniny výrazu, nicméně výsledky se vzájemně v řádech příliš neliší a je inhibována informace o 
proporci mezi jednotlivými kategoriemi. Proporcionalita hodnot může být důležitá v uplatnění veličiny vzdálenosti, a proto byla zvolena varianta 
výpočtu bez použití odmocniny. 
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dikované linky pro spojení s detašovanými lokalitami. 
Ohodnocení vstupních veličin i vypočtené normované 
vzdálenosti lze nalézt v tabulce 6. 


3.4. Využití normované vzdáleností 

Normovaná vzdálenost závisí na deterministických 
vstupech (pásmo, latence, dohodnutá míra dostupnosti). 
Výpočet i normalizaci zachovává proporce mezi jed¬ 
notlivými kategoriemi, a proto by mélo být možné 
využít nejen uspořádání kategorií, ale i přímo hodnoty 
vzdálenosti při dalších výpočtech. Díky stanovení in¬ 
tervalu hodnot (oboru hodnot funkce vzdálenosti) je 
možné vzdálenost využít i pro případy, kdy se integrují 
2 společnosti různé úrovně vyspělosti. Konečná nor¬ 
malizace nemá vliv na výpočty se vzdáleností, pouze 
usnadňuje jejich provedení a zlepšuje čitelnost. 

4. Vazby mezi integrovanými systémy 

Druhou vlastností integračních řešení analyzovanou v 
tomto článku je vazba mezi koncovými komunikujícími 
aplikacemi. Zatímco vzdálenost je příkladem charakte¬ 
ristiky prostředí v němž se integruje, vazbu formují sa¬ 
motné integrované systémy. Interaplikační vazba může 
být popsána sadou vlastností / atributů. Charakteris¬ 
tika vazby je důležitá pro objektivizaci popisu inte¬ 
gračního scénáře, protože různé systémy mohou nava¬ 
zovat diametrálně odlišné vazby a to nejen na základě 
způsobu jejich propojení nebo vzdálenosti (viz výše). 
Z hlediska vazby nás zajímá míra závislosti mezi in¬ 
tegrovanými systémy. Právě míra závislosti může být 
vyjádřena množstvím a vlastnostmi vazeb mezi komu¬ 
nikujícími IS. 

V softwarovém inženýrství se nejčastěji používá kla¬ 
sifikace označující volnou vazbu (Loose Coupling) a 
těsnou vazbu (Tight Coupling). Taková klasifikace je 
pro formální objektivizaci nedostatečná, a proto je nutné 
klasifikaci dále propracovat. Základ klasifikačního mo¬ 
delu lze převzít z metod pro optimalizaci návrhu pro¬ 
gramového kódu [3]. Problematika architektury inte¬ 
gračních řešení vykazuje celou řadu společných znaků, 
a proto můžeme základ klasifikace vazby mezi kom¬ 
ponentami komunikujícími přes počítačovou síť posta¬ 
vit na modelu určeného pro programování na jednom 
počítači. Tento model je však nezbytně nutné dále roz¬ 
pracovat, abychom zabránili automatické extrapolaci 
vlastností lokální komunikace na komunikaci v síti, 
jak bylo uvedeno dříve [1]. Vyjdeme-li z existujícího 
modelu, můžeme definovat následující kategorie vazeb. 
Ke každé kategorii uvádíme pro srovnání vždy příklad 
lokálního kódu i integračního řešení; 


Content Coupling (silná těsná vazba) 

Volaná komponenta nabízí svou funkcionalitu přímo, 
tj. volající iniciuje přímo výkonný kód volaného. Vo¬ 
lající musí znát přesně strukturu, ve které volaný volání 
přijímá. Příkladem je volání silně typovaně funkce v 
imperativním programovacím jazyce (např. v C) nebo 
volání funkce přes socket, tedy situace, kdy není využit 
Žádný vyšší protokol nad TCP a zasílaná data přijímající 
program přímo interpretuje a to vždy stejně (data neob¬ 
sahují řídící znaky). 

Common Coupling (sdílení úložiště dat) 

Představuje archetyp, v kterém 2 a více systémů sdílejí 
stejná data. Lze uvažovat na lokální úrovni (paměť), i na 
úrovni sítě (společná DB). Určujícím faktorem je i nut¬ 
nost znát přesně datový model a schéma řízení přístupu. 
Informace drží a implementuje každý integrující systém. 
Příkladem může být lokální volání funkce a předání pa¬ 
rametru odkazem (pointer), nebo vytvoření sdíleněho 
segmentu paměti (mezi 2 procesy OS), nebo třeba sdílení 
jedné databáze dvěma a více aplikacemi. 

External Coupling (externalizovaná společná vazba) 

Předchozí typ lze upravit exportem informací o syn¬ 
taxi na společné úložiště. Export obsahuje jak datové 
tak řídící schéma. Samostatně se dnes prakticky ne¬ 
používá, ale je nedttnou součástí masivně rozvířených 
případů externalizace schémat webových služeb (XSD), 
WS-Standardů (policies etc.), nebo třeba kaskádových 
stylů webových aplikací (CSS ). Lze však externalizoval 
i jiné informace, např. o místu přístupu k datům (tnsna- 
mes.ora pro Oracle DB apod.). Je tedy třeba uvažovat o 
jednotlivých vrstvách ISO/OSl. 

Control Coupling (vazba s řízením) 

Kategorie, kde volající komponenta přikládá k da¬ 
tovému obsahu zprávy řídící příznak (příkaz), tj. in¬ 
formuje volaného, jak s daty naložit. Volající tedy ne¬ 
musí znát všechny funkce volaného a s touto množinou 
lze dynamicky pracovat. Příkladem je jakákoli funkce, 
obsahující řídící argument. Lze vytvořit na úrovni 
kódu programovacího jazyka, ale stejně tak se využívá 
i v moderních WS-Standardech - např. hlavička 
{SoapAction... /) 

Stamp Coupling (volnost datového schématu) 

Je analogií předchozího případu, ale v oblasti dat. Vo¬ 
lající nemusí nutně zaslat všechny datové atributy, ale 
jen některé a volající data dokáže zpracovat (pokud to 
sémantika případu dovoluje). Webové služby založené 
na standardu SOAP [14] umožňují definovat povinné a 
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nepovinné parametry stejně jako dynamicky jejich mul- 
tiplicitu v dané zprávě. Využití principu Stamp Coupling 
predisponuje bezpečnostní slabiny systému a celkově 
zvyšuje nároky na robustnost výkonného kódu. 

Message Coupling 

Komunikace probíhá přes prostředníka, který uvolňuje 
vzájemné závislosti mezi komunikujícími systémy. Zde 
se nejedná o komunikaci messagingen tj. technologií pro 
asynchronní persistentní komunikaci, ale o prvek typu 
ESB [14]. Můžeme nalézt více úrovní vyspělosti vazby 
Message Coupling, jejich rozbor je však předmétem 
výzkumné práce a je mimo rozsah tohoto článku. Mezi 
lokálními programy lze využít předání přes sdílenou 
paměí s intervencí OS (např. pipelines na *nixu). V 
sítové integraci jsou příkladem jednak komerční pro¬ 
dukty IBM WMQ, případné produkty s vyšší logikou jako 
IBM WESB, SAP-PI, MS Bizztalk, BEA WebLogic .... 
ale i ESB řešení určená pro oblast zdravotnictví[4]. 


• Pokud komunikující systémy neznají vzájemně 
svou vnitřní strukturu, mluvíme o volně vazbě. 
Teoreticky by bylo možně snížit počty datových 
parametrů na 1 resp. 2 (in/out) a řídících na 0 a 
dosáhnout tak vazby c = 0, 5. Budování takových 
rozhraní je však kontraproduktivní. Volná vazba 
znamená i možnost změn v rozhraních bez nut¬ 
nosti změn mezilehlých a především protilehlých 
komponent. To nelze provést u nestrukturovaného 
rozhraní. 

• V závislosti na míře vyspělosti je u menších řešení 
(SOHO/Mainstream) vhodné budovat integrace s 
přímým řízením, kde komunikující strany přímo 
ovládají zpracování dat a naopak u výše zmíněné 
kategorie Enterprise dedikovat logiku zpracování 
na ESB [5]. A i dále je možné řídící atributy 
dělit na ty zpracovaně ESB a ty, které nesou in¬ 
formaci o sémantice dekódovatelné až konečným 
příjemcem. 


4.1. Ordinální vyjádření vazby mezi IS 

Stejně jako u vzdálenosti bude vhodně pro veličiny 
charakterizující vazbu mezi systémy definovat relaci 
ostrého uspořádání. Jeho úplnost je dílčím předmétem 
dalšího výzkumu. Předpokládáme ovšem, že bude nutné 
ustoupit od úplnosti a možná i od ostrosti uspořádání 
množiny definující kvalitu vazby. Tj. předpokládáme, že 
vazba mezi systémy nebude charakterizovatelná jedinou 
ordinální veličinou, ale minimálné dvěma. Důvodem je 
vzájemná kontradikce požadavků na volnost vazby a 
požadavků na výkonnost celěho řešení. Cílem tedy bude 
nalezení optimálního vyvážení těchto dílčích metrik. 

Zde uvádíme možné vyhodnocení míry závislosti 
mezi komunikujícími systémy pomocí množství 
vstupně/výstupních řídících a datových parametrů. 
Základ číselné reprezentace míry závislosti (síly vazby) 
mezi systémy lze založit na [3]. 


^plain _ 2 


1 

Dj -j- 2 • (yj + Dq 2 • Co 


( 3 ) 


kde Di, Ci jsou počty datových resp. řídících vstupních 
proménných (parametrů volání) a Do, Co jsou počty 
datových resp. řídících výstupních proměnných (para¬ 
metrů odpovědi). 

Je zřejmé, že uvedenou číselnou reprezentaci lze velmi 
jednoduše ovlivnit změnou počtu datových i řídících pa¬ 
rametrů. Uvedený historický model určený pro vyhod¬ 
nocení vazby mezi 2 lokálně běžícími programy tedy 
nelze použít v jeho originální podobě a musí být upra¬ 
ven. Shrňme nejzávažnější nedostatky: 


• Řídící informace mohou být poskytnuty v různě 
kvalitě v závislosti na míře jejich standardi¬ 
zace. U integračního řešení postaveného na 
obecně platných standardech [8] je zaručeně vyšší 
pravděpodobnost jeho opakovatelného využití 
a robustnosti v čase, než u těch budovaných 
na lokálních kódováních, číselnících, signali¬ 
zaci apod. Uvedeně tvrzení lze chápat na všech 
vrstvách od transportních protokolů, přes řídící in¬ 
formace pro komunikující služby [13] až po stan¬ 
dardizaci na aplikační úrovni [9]. 

• Komunikace mezi systěmy může z hlediska 
aplikační logiky rozložena do více kroků. 
Nejtriviálnějším případem je komunikace 
požadavek/odpověď, další varianty pak zname¬ 
nají již plnou statefull komunikaci s nutností 
udržování informací o relaci (session). Informace 
o počtu stavů musí být ve vyhodnocení volnosti 
vazby taká zahrnuta. 


Na základě uvedených informací upravíme algoritmus 
pro výpočet volnosti vazby o další aktivující i inhibující 
členy. 

Především zavedeme míru externalizace integračních 
funkcí e jako ordinální diskrétní veličinu vyjadřující mo¬ 
hutnost ESB. Definiční obor i navrhované hodnoty jsou 
uvedeny v tabulce 7. Prerekvizitou zařazení konkrétního 
prostředí k dané úrovni je splnění všech vlastností 
úrovní nižších, což někdy nemusí být automatické, 
především při využití orchestrace [6] procesů pomocí 
Business Process Engine [7]. 
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aspekt 

lokální komunikace 

stíová komunikace 

spolehlivost komunikační infrastruktury 

vysoká 

nízká 

rychlost komunikace 

vysoká 

nízká 

technologická diverzita 

nízká 

vysoká 

rozdílný režim správy 

nízkč riziko 

vysoké riziko 


Tabulka 1: Aspekty síťové komunikace 



synchronní 

asynchronní 

komunikace v reálném čase 

SDB, RPC, MS 

MS 

dávková komunikace off-line 

- 

BET 


Tabulka 2: Relace způsobu komunikace a jeho provádění v čase 


mezní hodnoty vzdálenosti 

pásmo [Hz] 

latence [s] 

míra dostupnosti pásma 

vzdálenost 

^min 

10^ 

10 

10-^ 

-8.995 

^max 

10^ 


0,99 

5,000 


Tabulka 3: Mezní hodnoty vzdálenosti 


případ 

pásmo [Hz] 

latence [s] 

míra dostupnosti pásma 

vzdálenost 

normovaná vzdálenost 

A 

10^" 

10-^ 

0,9 

-6,954 

0,146 

B 

Í(P 

10“'' 

0,7 

-4,845 

0,297 

C 

í(? 


0,1 

-3,000 

0,428 

D 

10' 


0,1 

-1,000 

0,571 

E 

n/a 

n/a 

n/a 

n/a 

n/a 

F 


1 

0,0001 

4,000 

0,929 


Tabulka 4: Vypočtené hodnoty normované vzdálenosti pro kategorii SOHO 


případ 

pásmo [Hz] 

latence [s] 

míra dostupnosti pásma 

vzdálenost 

normovaná vzdálenost 

A 

10“’ 

10-" 

0,9 

-6,954 

0,146 

B 

Í(F 

10-’^ 

0,7 

-4,845 

0,297 

C 

Í(F 


0,1 

-3,000 

0,428 

D 

10^ 

10-2 

0,01 

0,000 

0,643 

E 


10-' 

0,01 

1,000 

0,714 

F 

W 

1 

0,0001 

4,000 

0,929 


Tabulka 5: Vypočtené hodnoty normované vzdálenosti pro kategorii Mainstream 


případ 

pásmo [Hz] 

latence [s] 

míra dostupnosti pásma 

vzdálenost 

normovaná vzdálenost 

A 

10" 

10-* 

0,9 

-7,954 

0,074 

B 

ío™ 

Í(p5 

0,7 

-5,845 

0,225 

C 

10® 

W-* 

0,1 

-3,000 

0,428 

D 

Í(F 


0,01 

0,000 

0,643 

E 

10^ 

10-' 

0,001 

2,000 

0,789 

F 


1 

0,0001 

4,000 

0,929 


Tabulka 6: Vypočtené hodnoty normované vzdálenosti pro kategorii Enterprise 
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úroveň externalizace 

popis 

navrhovaná hodnota metriky 

P2P 

ad-hoc integrace. Pro n systémů max. integrací 

1 

adaptěry 

standardizace L4 protokolů 

2 

ad-hoc messaging 

MOM pro P2P spojení 

3 

content-based routing 

agregace, publish/subscribe a směrování dle řídících dat 

4 

dynamic routing 

externalizace řídících pravidel (konfigurovatelnost) 

5 

BPM 

řízení org. procesů dedikovaným SW nas ESB 

6 


Tabulka 7: Kvantifikace míry externalizace integračních funkcí mimo integrované aplikace 


úroveň DTD jazyka 

popis 

navrhovaná hodnota metriky 

žádný DTD 

syntaxe není externalizována 

1 

vlastní DTD 

syntaxe určena vlastní definicí ad-hoc 

2 

syntaktická XSD 

využití CSA standardizace [2], [11] 

3 

sémantické XSD 

využití IS standardizace [2], [9] 

4 


Tabulka 8: Kvantifikace míry standardizace dat 


Dalším důležitým parametrem je míra standardizace 
jazyka pro definici dat s. Opět se jedná o ordinální 
diskrétní veličinu. Definice je dána tabulkou 8. 

Posledním parametrem musí být počet komunikací v 
rámci relace mezi integrovanými systémy n. Pro zjed¬ 
nodušení zanedbáváme případ bezstavové komunikace 
typu požadavek/odpověď a hodnotu parametru defi¬ 
nujeme jako celkový počet přenosů dat nezávisle na 
směru. Při one-way komunikaci bude tedy n=l, při 
request/response bude n = 2, při 2 komunikacích tam 
a zpět bude n = 4 etc. 

Výše uvedené veličiny zavedeme do výpočtu volnosti 
vazby c následující úpravou vzorce 3: 


^lain 


n 

e • s 


1 


1 

Di -f 2Ci -f Do 4“ 2Co 


( 4 ) 


Je možné, že pro praktické využití bude nutné upra¬ 
vit hodnoty e a s tak, aby lépe vyjadřovaly propor¬ 
cionalitu mezi definovanými kategoriemi. K úpravě je 
možné přistoupit až po provedení testovacích výpočtů na 
reálných scénářích, což zatím nebylo provedeno. Hod¬ 
noty těchto veličin mají přímý vliv na obor hodnot vol¬ 
nosti vazby. Z toho důvodu zatím není vhodné navrhovat 
normalizovanou míru vazby tak, jak jsme to provedli pro 
vzdálenost mezi integrovanými 2. 


mělo být možné využít přímo ve vyhodnocování kombi¬ 
nací integračních vzorů. 

4.3. Degradace výkonu 

Nezanedbatelným markérem při objektivizaci inte¬ 
gračních řešení je míra degradace výkonu, která úměrně 
souvisí s volností vazby mezi komunikujícími IS. 
Uvolnění vazby mezi systémy vynucuje jednak struk¬ 
turalizaci rozhraní včetně datových formátů a dále pak 
použití dalších mezilehlých komunikačních prvků pra¬ 
cujících na vyšších vrstvách modelu ISO/OSI [15]. De¬ 
gradace výkonu je pak dána především: 


• Prolongací vytvoření resp. parsování zprávy ve 
všech bodech scěnáře (volající, ESB intermedia- 
ries, volaný). V analýze je nutné zvážit jednot¬ 
livé vrstvy ISO/OSI, nicméně lze očekávat, že 
řádově nejvýznamnější bude práce s dokumenty 
ve formátu XML způsobená DOM transformací 
[12] a dále pak konverze datových typů do/z 
řetězcového zápisu (viz dále). 


• Zvýšení doby potřebné na transport informace 
sítí kvůli existenci mezilehlých prvků pracujících 
přímo s aplikačními daty. 


4.2. Využití uormovaué míry vazby 

Na rozdíl od normované vzdálenosti vyjadřuje míra 
vazby nejen vlastnosti existujícího prostředí, ale dotýká 
se i samotného návrhu konkrétní integrace (vstupy Ci, 
Di, Co, Do a n). Normalizovanou míru vazby by tedy 


Vyhodnocení degradace výkonu již indikuje nutnost 
rozdělit výpočet po jednotlivých vrstvách modelu 
ISO/OSI, což překračuje možnosti tohoto článku. Do 
výpočtu degradace výkonu bude zahrnuta i normovaná 
vzdálenost integrovaných systěmů 2. 
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5. Závěr 

Uvedli jsme přehled a rozbor některých vlastností 
prostředí, v němž jsou budovány komunikace mezi 
informačními systémy. Ukázali jsme možnosti jejich 
strukturalizace a ordinálního ohodnocení. Ukázali jsme 
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Abstract 

Infrastructure as a Service (infrastructure 
which is offered to customer in the form of Ser¬ 
vice of the provider) is a deployment model 
which allows utilize data and computing capa- 
city of a cloud as a set of Virtual devices and 
virtualized machines. Infrastructure as a Service 
can be offered separately to each project. The 
same capacity of connected physical machines 
and devices can be shared. Currently, the con- 
cept of an Infrastructure as a Service is tested 
on several projects within activity of CESNET 
association, Eirst Faculty of Medicine, Charles 
University, Prague and Musical and Dance Fa¬ 
culty of Academy of Performing Arts in Prague. 

The current research in the field of compu¬ 
tation physiology is demanding on high compu¬ 
tation capacity. The computation tasks are dis- 
tributed to computers, which are provided by the 
infrastructure. The project in the field of analy- 
sis of human voice is demanding on high throu- 
ghput of Computer network between acoustic or 
video device on the local side and analytic appli- 
cation on remote high performance server side. 

This páper describes features and main challen- 
ges for infrastructure dedicated for such type of 
application. Infrastructure as a deployment mo¬ 
del of cloud computing might be beneficial for 
multi domain team and for collaboration and 
integration of high specialized software appli¬ 
cation. 

1. Introduction 

The penetration of broadband connection to the Internet 
with speed at least 2 Mbits per second was about 95% 
in Czech Republic in the beginning of the year 2011 
[8]. Therefore application with higher demand on con¬ 
nection speed becomes more available for generál users. 


This work discuss examples of application in biomedi¬ 
cal research whose deployment relates to high speed ne¬ 
twork from different perspectives. Even the application 
demands higher connection rate, they are not limited 
only to be ušed in and from high speed network avai¬ 
lable in academie community (e.g. CESNET2 network 
in the Czech Republic). Several technologies allows to 
deploy and use these application effectively and dyna- 
mically. 

2. Virtualization 

Virtualization is a technology which provides separation 
between software layer and underlying hardware layer. 
It allows execution of one or more so-called Virtual 
machines sharing one physical hardware. Virtualization 
techniques introduce some overhead when translating 
isolated application instruction to lower level of a sys¬ 
tém, however, performance penalty is generally smáli on 
newest hardware and virtualization systems (VMWare, 
XEN, KVM, ...). Thus the virtualization allows to con- 
solidate hardware capabilities into smaller units which 
may be utilized effectively. 

3. Virtual infrastructure 

Several Virtual machines which are connected via e.g. 
Virtual network which is routed on the physical network 
may form a Virtual infrastructure. These Virtual machi¬ 
nes may not necessarily run on one physical machine 
but may run on different physical machines geographi- 
cally dispersed. In contrast to Virtual infrastructure, they 
may become a Virtual organization is set of users from 
different physical organizations who for example work 
on the same project or share same data. Such Virtual or¬ 
ganization may use a Virtual infrastructure which is de- 
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dicated only for their purpose. Establishing the Virtual 
infrastructure is more easy with the virtualization tech- 
niques. 

The following figuře shows an example of several Virtual 
organizations and theirs infrastructures. On the right part 
there are schematic view on physical connections among 


different organizations (hospitals, research institutions) 
via academie network or the Internet. The physical re- 
sources are shown as vertexes and network connections 
are shown as edges. Each amebe connects Virtual machi- 
nes into Virtual infrastructure. On the left part there is a 
physical server executing more Virtual machines, each 
machine belongs to different Virtual infrastructure. 



Figuře 1: Illustrative schéma. 


4. National grid and desktop grid 

The computational grid is a hardware and software 
infrastructure that provides dependable,consistent, per- 
vasive, and inexpensive access to high-end computatio¬ 
nal capabilities [3]. The grid are ušed e.g. for compu¬ 
tation in high energy physics. An additional effort is ne- 
eded to administer and maintain the grid infrastructure. 
This task is typically provided by national grid initia- 
tive and the grid infrastructure is shared among different 
independent users. The national grid initiative in Czech 
Republic is maintained by the METACENTRUM acti- 
vity part of the association CESNET and coordinates 
also the work with NGI from neighboring countries in 
the European Grid Initiative (EGI). In contrast to nati¬ 
onal grid, there may be established ad-hoc, voluntary 
or also named desktop grid systém. Known project is 
SETI@home [4] which follows the idea that anyone 
connected to Internet can join this project and enhance a 
voluntary grid by downloading smáli client program and 
execute it in the background. This smáli program perio- 
dically asks for computational jobs and computes them 
e.g. as a sereen-saver. The grid nodes are typically PCs 
owned by individuals [5] [6]. 

5. Cloud computing 

Cloud computing is a model for enabling network ac¬ 
cess to a shared computing resources that can be rapidly 
provisioned and released with minimal management ef¬ 
fort or Service provider interaction [7]. The cloud com¬ 
puting is offered in three different types, as a Service, 


platform or the whole infrastructure. Infrastructure as 
a Service (laaS) consists from data-centers, computing 
resources and network. Like in the grid computing, the 
User of an infrastructure isn’t typically the owner of the 
infrastructure and doesnT need to maintain the physi¬ 
cal hardware. The cloud is currently offered as public 
cloud by multiple vendors, private cloud may be bu¬ 
lit using opensource or proprietary software (VMWare 
vCloud, Eucalyptus, OpenNebula, ...) or hybrid cloud 
which combines private and public cloud capabilities. 

6. Voice signál analysis 

The aim of the project FONIATR is to built a systém 
which can analýze input signál such as human voice or 
video of voice chords and provide a graphical output, 
which support decision of specialists e.g. phoniatrist or 
othorynolaryngologist. On top of that, it should collect 
statistical Information and voice samples with context 
Information provided by specialist for further analysis. 

The deployment of the application was a local in- 
stallation on the useťs working Computer in the past. 
This deployment model was changed from local in- 
stallation to a remote installation with remote ac¬ 
cess. There were considered several technologies and 
currently ušed access over remote desktop protocol 
(RDP) keeps transparency of the application in the me- 
aning, that a user of such application should not no- 
tice significant change in use and behavior. Even de- 
velopment of such application doesnT need any chan- 
ges if there is not specific requirements on quality of 
data transfered. RDP transfers from user’s client appli- 
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cation events from mouše and keyboard to the remote 
systém where it is interpreted and graphical changes are 
transferred back to clienťs application which visualizes 
it. Due to lack support of sound recording redirection in 
RDP, there were introduced our own customization for 
RDP protocol to redirect sound recording over RDP wi- 
thout loss of Information [1]. 

The systém currently consists of two distinct parts. One 
part is Client application - generic remote desktop client 
(part of standard accessories of MS Windows systém, 
or RDESKTOP program for Linux) and a plugin which 
adds a custom Virtual channel and switches on/off recor¬ 
ding on the local microphone and redirects digital sound 
signál through RDP Virtual channel. 


Second part is an application on a configured server, 
which can be accessed over Internet. After logging into 
remote session, this application starts instead of generic 
desktop and customized serveťs RDP plugin Controls 
switching on/off recording and receives digital sound 
signál which it writes to a filé on serveťs disk and pro- 
vides API to access sound samples. This API is ušed 
by analytical application to provide real-time analysis 
of the voice signál as well as post-processing analysis 
which is doně after recording is finished. 

The server part of application is deployed on several Vir¬ 
tual machines, each one is accessible for different set 
of users, currently one is dedicated for development and 
testing purposes with restricted access, the second one 
for production with generál access. 


Internet 




RDP Virtual channel 


sen/er’s RDP plugin 


WAVfUe 


Application running on server 




firewall 


^.^clienťs RDP plugin 
-RDP 


'i 


window of remote desktop on client 


Figuře 2: Schéma of systém for human voice analysis and remote recording via RDP protocol. 

7. Identification of physiological Systems 

The result of the project Identification of Physiological 
Systems offers a web Service distributing the computati- 
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onal task to desktop computers connected via desktop 
grid systém BOINC and SZTAKI Desktop Grid API [2]. 
The schéma on following figuře shows architecture of 
the systém. The server is in operation as an independent 
Virtual machine and contains a SOAP web Service cont- 
roling the distribution of task over BOINC middleware. 
Some of the BOINC workers are in operation as inde¬ 
pendent Virtual machines. Some of the desktop compu¬ 
ters of laboratory and classroom of First Faculty of Me- 
dicine are connected to this desktop grid systém. Other 
computers may be easily joined later. Current research 
is focused on the possibility to enhance computational 
capacity of the infrastructure by the resources provided 
by NGI or involvement of GPU computing. 



Figuře 3: Schéma of computational infrastructure for identi- 
flcation of physiological Systems. 

8. Discussion 

Relatively independent project with completely diffe- 
rent types of users may share same physical resour¬ 
ces and may outsource tasks related to establishing and 
maintaining IT infrastructure. Virtualization and Vir¬ 
tual infrastructures offers such effective way to do that. 
However as seen in both cases, an effort is needed to 
implement or adapt communication protocols because 
single parts of the systém is not deployed on single ma¬ 
chine and needs to exchange data to work appropriately. 
Most of them are standardized or can be easily enhanced 
by custom plugins. The introduced infrastructure can be 
characterized as a private cloud, which is accessible to 
users from different communities related to biomedical 
research. There are not ušed speciál tools to administer 
cloud within the pilot infrastructure, because the number 
of projects is relatively smáli currently. Anyway, there 
exist free or commercial products (Eucalyptus, Open- 
Nebula, VMWare vSphere), which provides set of tools 
to automatize the maintenance of private cloud, inclu- 
ding Virtual network configuration, live migration of Vir¬ 
tual machine, etc. The important question is: which type 
of application is suitable for clouds operating on physi¬ 


cal resources spread in different geographical locations 
compared to clouds operating in supercomputing cen- 
ters. Cloud in supercomputing centers are suitable for 
highly parallel tasks which needs fast communication 
between parallel computational tasks. Cloud operating 
on physical servers in different geographical locations 
can offer a free capacity in the time period, when the 
owner doesnT utilize its physical resources and offers 
them to other users of cloud. 


9. Conclusion 

Iťs possible to operáte private cloud on the physical in¬ 
frastructure and to provide Virtual infrastructure to the 
users, who can utilize it to execute their own applicati- 
ons and systems. Infrastructure as a Service can open 
an access to distributed systems to higher amount of 
users, who háve been so far prevented from using them 
by complicated administration, too long process of pur- 
chasing and installing computing resources. This type of 
advanced application are available to any user via Inter¬ 
net, it has reasonable responsiveness when connecting 
via broadband connection. The cloud operating on phy¬ 
sical servers in different geographical locations can be 
a suitable complement to the clouds in supercomputing 
centers. 
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Abstrakt 

Cascadic Conjugate Gradient Method (CCG, [Deuflhard 1994]) je metoda pro řešení eliptických parciálních dife¬ 
renciálních rovnic. V příspěvku uvedeme (a posteriorní) odhady algebraické a diskretizační chyby, popíšeme metodu 
CCG a navrhneme pro ni nová zastavovací kritéria. Ta jsou poté v numerických experimentech porovnána se zasta- 
vovacími kritérii odvozenými v původním élánku. 
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Abstract 

This páper shows the methodology ušed for 
the design of the short-term electricity power 
consumption model with a hour resolution. The 
key fundamental drivers (temperature, industry, 
seasonalities) are identified and their relative im- 
pact is estimated by the statistical analysis. The 
neural model based on the negative correlation 
ensemble is presented. The finál model is enhan- 
ced by linear auto-regression correction. The fi¬ 
nál model percentage absolute error is about 1.2. 

1. Introduction 

Suppose we háve an electricity transmission grid. 
This grid serves as an underlying infrastructure for 
transferring an electrical power from electricity pro- 
ducers to its consumers. On the supply side of this 
equation, there is a set of various power plants of many 
different types and properties. On the demand side, there 
is a significantly higher number of electricity end users: 
households, industrial and transport facilities, public as 
well as private Services, etc. 

From the economical point of view, there are a supply 
and a demand, and thus a suitable plače for the market. 
This market really exists, as a speciál form of the com- 
modity market. The speciality of the electricity market 
lays in the fact, that the commodity itself, the electrical 
power, could not be efficiently stored. In every single 
moment, the total volume of the generated power must 
follow very dosely the total amount of power consumed, 
otherwise, the grid may collapse. That is why the market 
rules and mechanisms are set in the way that benefits the 
behaviour of market participants, that contributes to the 
stability and predictability. Also, compared to the other 
commodities, short-term consumer decisions are not af- 
fected by the price of the commodity. Non of us switch 


off the light just because there is a temporal shortage in 
the grid. 

The previously mentioned facts makes it vital for all 
the market participants including power producers, dis- 
tribution companies, as well as purely financial power 
traders, to be able to anticipate the future levels of the 
power consumption as a key market driver. 

In this article, a methodology of a building of a model 
for short-term consumption forecast will be described. 
First, the electricity consumption is rigorously defined 
in the introduction to the sec. 2 and the results of the ini- 
tial statistical analysis of the main consumption drivers 
is presented in the sec. 2.1. In the section 3 the neural 
model based on the negative correlation ensemble (sec. 
3.1) is presented including inputs (sec. 3.2) and outputs 
(sec. 3.3) description. In the section 3.4, the model per¬ 
formance is discussed. 

2. Consumption 

The overall electricity consumption of a specified grid 
(in this čase The Czech Republic) is a value, estimated 
as the load of the grid reduced of the transmission los- 
ses, pumping storage consumption and the current im¬ 
port/export balance. The load itself is estimated as the 
total electricity generation at the moment minus the šelf 
consumption of the sources. 

The consumption is thus always an estimate. Moreo- 
ver, only larger generation facilities report their output 
on line. Mainly the renewable sources (solar, wind, and 
minor hydro), that are also hardly predictable, report at 
best with several days delay. For the purposes of this 
analysis, the time series of the Czech consumption re- 
trospectively published by ČEPS, a.s., the Czech natio- 
nal transmission systém operátor, is ušed. 
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2.1. Analysis 

From the statistical point of view, the consumption va- 
riable is a sum of the power input of all currently ope- 
rating appliances i.e. an aggregate value of large num- 
ber of random variables. This attribute of consumption 
time series creates a potential of a suitable precise pre- 
dictions. Flowever, the variance of the series is relatively 
high. A long term average consumption of the Czech 
Republic is approximately 8000 MW, but the maximal 


levels exceeds 11 000 MW while minimal levels are be- 
low4 500MW. 

Seasonality: The time series of the consumption is 

strongly seasonal. There are three major seasonal cycles 
clearly noticeable on the curve. First is the year cycle 
(see Figuře 1), having its maximal levels in the begin- 
ning of January and its minimum in between July and 
August. This cycle, with the amplitudě about 1 800 MW, 
is mainly caused by the air temperature and heating sea- 
son (in winter) an the vacation period (in the summer). 


The electricity consumption in The Czech Republic in 2010 
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Figuře 1: The course of the Czech electrical consumption in the hour resolution. The black line of the long-term moving average 
represents the intra-year seasonality with several identified irregularities. A - Easter, B - Mass vacations of various 
industry corporations, C - Christmas. 


The second seasonality is the clearly observable week 
period having its maximum on Wednesday and Thur- 
sday, while minimum clearly appears on weekends espe- 
cially on Sunday. This regular pattern is caused by 
the business cycle having its amplitudě about 700 MW. 
The hnal regular pattern is the intra-day load curve, 
commonly called “camel back“ having its maximum at 
11 o‘clock and minimum between 4 and 5 in the mor- 
ning with its amplitudě about 800 MW. 

Figuře 1 documents, that the regular course of the long- 
term consumption is significantly disrupted by several 
deviations, three major ones are signed by letters. All 
public holidays on work days causes signiíicant decre- 
ase of consumption, but not all of them equally large. 
The strongest outage is caused by Christmas and Eas¬ 
ter. There is a lot of nuances in the impact of particular 
day off, depending on its position in the week, season, 
and other factors. If the day forms so called longer wee- 
kend, the impact is usually stronger. Also a phenomenon 


called bridge day — a work day between two free days 
— is well known to energetics being even harder to pro- 
perly predict. 

Trend Factors: Although the long term trend fore- 

cast is not a signiíicant part of a short-term model con- 
struction, the Identification of past trend factors is a vital 
part of the analysis. Several economical indicators háve 
been examined in order to explain the temperature inde¬ 
pendent year on year changes. The most precise corre- 
lation has been discovered in the Industrial Production 
Index (IPI), published by Eurostat with approximately 
3 months lag. Eigure 2 shows the daily consumption in 
work days normalised to 20°C fitted by the course of 
IPI. The relation is clearly linear with the ratio of 23 MW 
per point, when the 100 points refers to the average in¬ 
dustrial production of year 2005. The remaining diffe- 
rence between years remains neglectable compared to 
the effect of IPI and does not overcame 50MW year on 
year. 
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Czech Consumption (norm. to 20 deg C, work days) 
vs. Industrial Production Index (rescaied) 



Time [Years] 

Figuře 2: The average daily consumption of work days (grey 
circles). The hlack line represents the linearly res- 
caled course of industrial production index. 


Temperature and other weather conditions: As 

it was already mentioned, the air temperature is the 
most formative factor affecting the consumption. On 
the chart of the year consumption course (Fig. 1) there 
can be identified the heating season (approx. Oct-Apr). 
The analysis shows a smooth threshold being about 12- 
13°C of the air average temperature. The dependency of 
the temperature is not linear and the statistical analysis 
interestingly shows stronger correlation of the current 
day electricity consumption with previous days ave¬ 
rage air temperature, rather than the temperature of the 
current day. The maximal correlation is achieved using 
the 3 days moving average. Figuře 3 shows the fit of a 
simple neural model where the actual and previous air 
temperatures are taken as an input. This model overper- 
formed similar linear and quadratic model suggesting a 
close to sigmoidal course of the dependency. 

Together with the air temperature, two other weather va- 
riables were examined in order to analyse the effect of 
the sun light intensity to the electricity consumption. 
The first of them is the so called normál irradiation 
characterising theoretical maximal amount of the so- 
lar energy falling on specific point on the earth sur- 
face in the particular moment. The second value is the 
cloud cover — a sort of meteorological value descri- 
bing the fraction of the sky covered with clouds. This 
value is measured in okta — (|), 0 stands for clear sky, 
while 8 for full cloud cover. Unfortunately, according 
to the consultation with meteorologists, the cloud co¬ 
ver values are not currently measured automatically but 
only by empirie human observation or rather estimate. 


Temperature correction of Consumption (Neural model) 



Figuře 3: The dependence of the consumption on the current 
day temperature and the temperature of previous 
three days. 

Moreover the cloud cover does not even try to measure 
the quality of the clouds, thus the high atmosphere li¬ 
ght cloudiness is weighted exactly as dark and heavy 
storm clouds, although the shadow magnitude is com- 
pletely different. In špite of these facts, both of these 
values proved to be statistically relevant. A rough esti¬ 
mate of the effect of one okta on the total consumption 
is about 20MW. Notě that this estimate seems to be re- 
alistic, considered the fact the total power consumption 
of the public Street light systém is about 75 MW. 

3. Model 

The regression model from a generál scope is a sort 
of function, having its input and output variables. Two 
kinds of inputs can be identified: let us call them explicit 
and implicit. The explicit inputs (such as temperature or 
sun-light) vary from pattern to pattern and they (are the 
thing what) directly affect the output. The implicit inputs 
are typically unknown, they do not change much during 
the time but they affect the way how is the output linked 
with the input. An example of such parameters could be 
a number of households that use an electrical heating 
or a fraction of companies dosed during the Christmas. 
There is an effort when designing the model to keep the 
explicit parameters as the model inputs and let the model 
to train the implicit parameters from the data. 

The goal is to build a neural model of the consumption 
with an hour resolution. The straightforward approach 
would be a network with a set of statistically identi¬ 
fied input variables and a single output deseribing the 
consumption. Various hours of the day are however de- 
pendent on the inputs in very different manner. Consi- 
der the sun light being a crucial parameter at 7 o’clock 
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in the morning while totally dismissable parameter at 
12 o’clock at night. To make the model distinguish be- 
tween the hours, another inputs describing the position 
of the hour in the daily diagram would háve to be invol- 
ved. 

That is however a pure example of an involvement of the 
implicit parameters (the way how the input should be 
processed) as an explicit parameter of the model. That is 
why another approach is chosen. The basic model unit 
is constituted of the neural network of classical multi- 
layer perceptron design with multidimensional output. 
The model input consists of previously mentioned sta- 
tistically relevant variables, while the output forms the 
24 variables — one for every value in the daily con¬ 
sumption diagram. An alternativě approach would be 
training 24 networks with a single output. Two reasons 
favour the single model: firstly, a useful interaction of 
neighbouring hours during the training strengthening the 
generalisation abilities of the model could be expected, 
secondly, to keep a set of 24 models together with the in- 
tention of building the ensemble would be considerably 
impractical. 

3.1. The negative eorrelation ensemble 

The theoretic idea behind grouping neural networks into 
ensemble models is the reduction of an error variance 
under the condition of not increased an error bias. At 
least a minimal level of discrepancy and independece of 
the member networks is silently expected. In order to in- 
volve such discrepancy, various methods could be ušed 
such as training set alternation or ensemble pruning. In 
this article, the approach of negative eorrelation ensam- 
ble learning [3] is applied. As the name of the method 
suggests, the algorithm of negative eorrelation learning 
focuses on the reduction of the covariance between the 
output of indivudual networks while simultaneously ke- 
eping the bias of the networks suitably low. 

The ensemble consists of a set of uniform networks tra- 
ined on the identical training set. The ensemble out¬ 
put F{xi) is computed as an unweighted mean of the 
M member network outputs Fj(xi) for a particular 
pattern Xi. 

M 

1=1 

All member networks are trained simultaneously to re- 
duce their error and to differ one from the others by 
altering the penalty function. In the standard back- 
propagation [2] alhorithm, the learning error for a single 
network is calculated as: 


JV iV 

i=l i=l 

In the negative eorrelation algorithm, an error describing 
the eorrelation is added: 


where 


Pj.i = (a^i) - F{xi)) {Fk{xi) - F{xi)) 

Notě that the A parameter is advised to vary between 
0 (independent training) and 1 (single huge network). 
There is no rigorous way to adjust the parameter. The 
empirical value giving the best results for this experi¬ 
ment is about 0.8. 

3.2. Input 

The basic set of input variables roughly follows the 
conclusion of the previously mentioned statistical ana- 
lysis. First of all, 3 temperature inputs: the values of 
an average daily air temperature, daily minimal tempe¬ 
rature and also the average of three previous daily tem- 
peratures. The temperature inputs were calculated as a 
population fraction weighted mean of values measured 
in the three major cities: Prague, Brno and Ostrava. To 
determine the level of sun light, two previously mentio¬ 
ned variables, the normál irradiation and the cloud cover 
were involved. 

The trend compartment is represented by the value of 
IPI (see Fig. 2). The currently unknown future values 
were estimated using the forecast of the GDP growth 
published by the ČNB. The seasonalities are represen¬ 
ted by an unary coded set of dummy variables, one per 
every day of week and one per every year. The last 
currently unfinished year shares the dummy with the 
previous year. 

Several iterations of model training were performed in 
order to determine the major error cases. A few other 
variables had to be included. The hrst of all the statě 
and the religion holidays had to be involved as a spe¬ 
ciál dummy as well as the previously mentioned bridge 
days. It turns out that the so called Christmas week 
(usually between the 24th of December and the Ist of 


PhD Conference ’ 11 


100 


ICS Prague 



Petr Paščenko 


Power Consumption Forecasting ... 


January) exhibits a systematic decrease of power con¬ 
sumption and thus deserves its own input. Finally a va- 
riable indicating a longer weekend (a weekend preceded 
or followed by a boliday) and a border day (a work day 
just before or just after sucb longer weekend), were ad- 
ded. 


The actual and predícted consumption levels ín Feb. 2011 
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Figuře 4: The test set model performance. The course of 
actual consumption (black) and the day ahead fo- 
recast (gray) in February 2011. 

Another two periodical events had to be considered. 
First is the beginning of the summer boliday, tradi- 
tionally followed by the mass vacation in many in- 
dustrial factories causing a considerable decrease of the 
power consumption. Second is the day-light saving time 
change in March and October. In špite of the common 
belief, the effect of the day-light saving on the sum of 
power consumption is purely marginal, however what is 
moderately affected is the daily diagram shape. For both 
events a dummy was included in the set of model inputs. 

3.3. Output 

As previously mentioned, the model has 24 outputs de- 
scribing the hourly consumption levels. As the con¬ 
sumption is a summative variable, the output variables 
reflect the absolute value of consumption rather than 
a difference from the consumption normál or another 
commonly ušed difference coding. This approach is mo- 
tivated by the afford of modelling the contribution of 
speciíic inputs to the total consumption as well as their 
interactions. 

3.4. Performance 

The model was trained on the historical data that covers 
the years 2007-2010. The last training set member is the 
6th of December 2010. Since this point, the model was 
not retrained and it runs every day as a consumption fo- 


recaster using the currently most plausible weather fo- 
recast. The current test set thus contains the time period 
between Dec 2010 and June 2011. 

The MAE of the day ahead prediction of the model in the 
hour resolution on the speciíied test set is 124.2MW and 
corresponding PMAE 1.5%. Several notoriously pre- 
carious events over the year can be identified. If the 
Christmas week for instance is omitted from the test 
set, the MAE value decreases to 114.8MW (PMAE to 
1.4%). 

The performance of the model varies during the year and 
it is significantly dependent on the weather forecast qua- 
lity. In the summer the weather is more stable and the 
level of irregularity of the consumption is lower. In that 
period, the model forecast is almost perfect (see Fig.5). 


The actual and predícted consumption levels In June 2011 



Time [Hours] 

Figuře 5: The test set model performance. The course of 
actual consumption (black) and the day ahead fo¬ 
recast (gray) in June 2011. 

On the contrary, the winter time and even more the tran- 
sition period of the early spring is characterised by har- 
dly predictable weather that significantly affect the mo¬ 
del performance (see Eig. 4). A serious source of the 
prediction error is the so called inversion cloudiness cha¬ 
racterised by the lower clouds and fogs that lay bellow 
the minimal altitude level of the meteorological models 
of cloud cover. 

To improve the performance of the model for the 
day ahead forecast, a simple auto-regression correction 
based on the previous day error was implemented. This 
smáli enhancement reduced the day ahead error to 
94.3 MW (PMAE to 1.2%). 
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4. Summary 

In this article, a complete methodology of the neural 
model for the short-term consumption forecast is briefly 
described. The finál model performance with the error 
about 1.2% can be considered as successful example of 
neural network application to the reál industry problém. 

The building of the model with a hour resolution is a 
difficult task demanding several fundamental decisions. 
The model presented in the article splits the objective 
consumption drivers from the auto-regressive inputs and 
thus can be ušed for scenario based long term predicti- 
ons. Still, when present, the information about the pre- 
vious course of the consumption can be also utilised by 
the linear regression correction. For the future work, this 
simple mechanism can be replaced by more sophistica- 
ted sub-model in order to improve the total model per¬ 
formance. 

Another topič for the future work is the way, how the 
model deal with the irregularities such as holidays and 


Christmas. There is no doubt, that these events are a sig- 
nificant source of model error and a sort of speciál mo¬ 
del dealing with them can be useful. The alternativě of 
splitting the training set to well chosen sectors such as 
seasons or work/free days in order to train more specia- 
lised models is another great topič. 
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Abstract 

The use of meta-models has a long tradi- 
tion in the field of evolutionary computation. 
However, it is not well studied in the field of evo¬ 
lutionary multiobjective optimization. In this pá¬ 
per, we present a multiobjective evolutionary al¬ 
gorithm with local meta-models and compare its 
performance to traditional multiobjective evolu¬ 
tionary algorithms. 

1. Introduction 

Evolutionary algorithms for multiobjective optimization 
are among the best methods for solving optimization 
problems with multiple objectives. 

In the past years several multiobjective evolutionary al¬ 
gorithms (MOEA) [1-4] were proposed and ušed to deal 
with these problems. However, most of them require lots 
of evaluations of each objective function, which makes 
them problematic to use for solving reál life problems. 
These problems may háve complex objective functions 
whose evaluations are expensive (either in terms of time 
or money). 

Two main approaches are ušed to make the MOEAs 
more usable. One of them is parallelization, the other is 
the use of meta-models. Parallelization only helps to re- 
duce the overall run-time, however, any costs associated 
with the evaluation (i.e. running a physical experiment) 
remain. 

Meta-models aim at lowering the number of objective 
function evaluations in a different way. They replace 
the originál objective function with a model of it. There 
are a few ways to obtain these models. One of them, 
ušed especially in engineering, is to use a different phy¬ 
sical model (some of the less important variables can 


be ignored). Another approaches use response surface 
methods, regression, or different mathematical methods. 
Yet another approach is to use a computational intelli- 
gence based model, e.g. RBF networks and multilayer 
perceptrons. 

In this páper, we present our multiobjective evolutionary 
algorithm with aggregate meta-model, but hrst, we de- 
fine the problém of multiobjective optimization, briefly 
present existing multiobjective evolutionary algorithms 
(MOEA) and describe the use of meta-models in MO¬ 
EAs. Finally, our algorithm is compared to existing MO¬ 
EAs in terms of the number of needed objective function 
evaluations. 

2. Multiobjective optimization 

Contrary to single-objective optimization, in multiob¬ 
jective optimization there are more objective functi¬ 
ons, which shall be optimized simultaneously. These ob¬ 
jective functions are usually conflicting, and thus there 
is not a single solution, which would be optimal for all 
of them. This leads to a set of so called Pareto optimal 
Solutions. 

The following definitions introduce the multiobjective 
optimization problém and the Pareto dominance re- 
lation, which is ušed to compare two potential Solutions 
to the problém. 

Definition 1 The multiobjective optimization problém 
(MOP) is a quadruple {D, O, /, C), where 

• D is the decision space 

• O C R” is the objective space 

• C = {gi, ..., Qm), where Qí : D — > R is the set 
of constraint functions (constraints) defining the 
feasible space = {x 6 D\gi[x) < 0} 
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• / : ^ > O is the vector of n objective functions 

(objectives), /= (/i, /i : R 

á; 6 D is called the decision vector and y 6 O is deno- 
ted as the objective vector. 

Only minimization problems are usually considered as 
maximization and mixed problems may be easily trans- 
formed to minimization ones. 

In the field of multiobjective optimization, problems 
with more than 4 objectives are often called many- 
objective, as this higher number of objectives poses ano- 
ther challenges for the MOEAs (e.g. the dominance re- 
lation defined in the next paragraph loses its power to 
discriminate between good and bed individuals as most 
of them are mutually incomparable). 

To compare two decision vectors, we deíine so called 
Pareto dominance relation. If one vector is better (has 
lower objective values) for all of the objective functions, 
we say it dominates the other vector. This is formally 
stated in the following deíinition. 

Deflnition 2 Given decision vectors x, y E D v/e, say 

• X weakly dominates y {x A i/) if Ví 6 {1... n} : 
fr{x) < fi{y). 

• X does not dominate y (x y) ií y :< x or x and 
y are incomparable 

Now, we can statě the goal of the multiobjective opti¬ 
mization, it is to find those decision vectors, which are 
minimal in the Pareto dominance relation. 

Deflnition 3 The solution of a MOP is the Pareto (opti- 
mal) set 

P* = {x E d>\ \/y E : y x} 

The projection of P* under / is called the Pareto opti- 
mal front. 

The Pareto optimal set is usually iníinite for continuous 
optimization and thus we usually seek a íinite approxi- 
mation of this set. This approximation should be close 
to the Pareto set (ideally it is a subset of it) and should 
also be evenly distributed along the Pareto front. 

We can extend the Pareto dominance relation to such 
approximations and compare them with this relation, 
however, as the ordering is only partial, there would be 
pairs of approximations which are mutually incompara¬ 
ble (in fact, most of such pair would be incomparable). 


As we want to compare approximations, which are So¬ 
lutions found by a multiobjective optimizer, we need a 
way to compare any two sets. 

During past years, many measures were proposed to 
compare such Pareto set approximation and one of the 
most often ušed is the hypervolume indicator [5]. This 
indicator expresses the hypervolume of the objective 
space, which is dominated by the Solutions. 

Deflnition 4 Let Re O be a reference set. The hyper¬ 
volume metric S is defined as 

5(A) = \(H{A,R)) 

where 

• H{A, R) = {x E 0\ 3a E A 3ř E R : \/i E 
{1,... ,n} : fi{a) ^ Xi ^ řj} where fi is the 
í-th objective function 

• A is Lebesgue measure with \{H{A,R)) = 

Jo ^H{A,R)(z)dz and 1h{A,r) is the characteris- 
tic function of the set H (A, R) 

The reference set bounds the hypervolume from above. 
It usually contains only a single reference point. We 
should notě here that although the deíinition of the hy¬ 
pervolume indicator is quite simple, its computation is 
known to be #P-complete and its complexity grows ex- 
ponentially with the number of objectives. 

3. Multiobjective evolutionary algorithms 

Traditionally, evolutionary algorithms háve one fitness 
function. However, in multiobjective optimization, we 
need to optimize multiple functions at once. Another di- 
fference is that in multiobjective optimization we seek 
a set of Solutions instead of a single one. This implies 
there are some differences between single-objective and 
multiobjective evolutionary algorithms. 

MOEAs usually do not return a single solution, rather 
the whole population in the last generation (or an exter- 
nal archive) are returned as the solution. The algorithms 
also differ in how they select individuals to the next ge¬ 
neration. They can be divided into three groups based on 
the type of selection they perform. 

First group, represented by the oldest multiobjective 
evolutionary algorithm uses some kind of scalarization, 
or aggregation, during the fitness assignment. VEGA 
[6], the oldest MOEA, ušed different objective function 
in each generation, thus finding compromise Solutions. 
However, this often leads to convergence towards the 
optima of the respective objective functions and only 
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a few compromise Solutions remain in the population. 
Newer algorithm from the same group, MSOPS [7], uses 
weighted sums of objectives to create a ranking matrix, 
which is later ušed during the selection (as a simplifi- 
cation: objective vectors, which yield hetter values of 
the weight sum more often are better and háve higher 
probability of being selected). 

Another group of algorithms, represented e.g. by the 
well-known NSGA-II algorithm [1], uses the dominance 
relation during the selection process. Usually, the popu¬ 
lation is divided into so called non-dominated fronts. In- 
dividuals which are not dominated hy any other in the 
population are assigned front number 1. These are tem- 
porarily removed and individuals non-dominated by the 
rest are assigned front number 2, this process is iterated 
as long as there are any individuals in the population. 
Than, individuals from fronts with lower number are se¬ 
lected hrst. There are usually other criteria to discrimi- 
nate between individuals in the same front, in the čase 
of NSGA-II it is so called crowding distance, which rou- 
ghly corresponds to the distance to the closest individual 
in the objective space (and individuals from less crow- 
ded regions are given preference). 

Yet another group of algorithms is based on indicators. 
These indicators usually somehow refine the dominance 
relation. One of the algorithms in this group is IBEA [3]. 
This algorithm uses binary indicator, which compare 
two individuals to assign the íitness in the following 
way: the indicator value of each pair of individuals is 
computed, and an individual i is assigned íitness 

F{i) = ^ 

Here, k is a scaling factor which has to be set in advance. 
The purpose of the exponential is to amplify the diffe- 
rences between dominated and non-dominated individu¬ 
als. An example of such an indicator may be the e+ in¬ 
dicator which expresses, how much an objective vector 
needs to be moved to became dominated by the other 
vector. The following definition States this formally. 

Deflnition 5 Let A, B be two decision vectors 

Ie+{A,B) = min{Vf eB3yeA: fi{y)-e < fr{x)} 
€ 

Indicator based MOEAs are among the most modem 
ones. Some of them even use the hypervolume indica¬ 
tor directly. In this čase, they must somehow overcome 
the complexity of the computation of this indicator, to 
be able to scale well for problems with many objective 
function. One of such algorithms, HypE [4], solves this 
problém by using Monte Carlo sampling to compute the 
hypervolume indicator. 


4. Meta-models in MOEAs 

When dealing with single-objective problém, there are 
three main ways of incorporating the meta-model in the 
evolutionary algorithm: 

• Meta-models are ušed directly instead of the íit¬ 
ness function - the íitness function is replaced by 
the meta-model and the meta-model is optimized. 
In the extreme čase, this is doně in the beginning 
and the model never changes thereafter, more usu¬ 
ally the model is updated after a given number of 
generations. 

• Meta-models are ušed to pre-evaluate individuals 
- each individual is evaluated by the meta-model 
to estimate its quality, but only the best individu¬ 
als are evaluated by the originál íitness function. 

• Meta-models are ušed in some kind of memetic 
operátor - this operátor takés some of the indi¬ 
viduals and moves them closer to the (local) op¬ 
timum of the meta-model. Gradient methods and 
other local optimization methods (even evolutio¬ 
nary algorithms) may be ušed in this čase. 

With multiobjective optimization, the situation is more 
complicated, as the approaches differ in what and how 
the models predict. In one of the hrst approaches [8] its 
authors ušed the NSGA-II [1] and replaced the objective 
functions with their meta-models. 

Other algorithms use some kind of aggregation of the 
objectives. In [9] authors describe an aggregate meta- 
model based on the combination of One-Class SVM 
and Support vector regression. Their model is trained 
to differentiate between dominated and non-dominated 
individuals, and it is ušed during the evolution to pre- 
evaluate the individuals and drop those who are not pro- 
mising. The same authors in [10] proposed a similar ap- 
proach based on rank-based SVM [11]. 

Although the memetic variant is also possible in mul¬ 
tiobjective setting, only a few references were found in 
the literatuře which deal with meta-model assisted mul¬ 
tiobjective memetic algorithms. In [12] the authors pro- 
pose such an algorithm. They use a meta-model (in this 
čase RBE networks are ušed) for each of the objective 
functions. During the local search one of the objectives 
is selected for reíinement, and a local meta-model is tra¬ 
ined and ušed during the local search. 

In [13] the authors propose another method: they use a 
single-objective meta-model assisted evolutionary algo¬ 
rithm in the local search phase. Two different local meta- 
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models are ušed, both trained to approximate a weighted 
sum of the objectives. One is an ensemble model, the 
other is a low order polynomial. Two single-objective 
algorithms are run to find optima of the respective mo¬ 
dels, which are then precisely evaluated. A selection 
proceduře is then ušed to decide which of the individuals 
(if any) is added to the population. 

In this páper, we present another multiobjective memetic 
algorithm with aggregate meta-model. 

5. LAMM-MMA 

LAMM-MMA is a variant of another algorithm we pro- 
posed earlier called ASM-MOMA [15]. ASM-MOMA 
uses the distance to the currently known Pareto front as 
the target value predicted by the meta-model. This meta- 
model is ušed inside a memetic operátor, which impro- 
ves some of the individuals in the population. 

This operátor uses only meta-model evaluation, and 
thus does not increase the number of the reál objective 
function evaluations, which are considered expensive. 

The main difference between ASM-MOMA and 
LAMM-MMA is that LAMM-MMA uses local meta- 
models instead of a single globál one. 

More speciíically: LAMM-MMA uses an existing mul¬ 
tiobjective evolutionary algorithm (almost any of them 
can be ušed) and adds a memetic operátor and an archive 
of evaluated individuals. This archive is ušed during the 
creation of a training set for the meta-model. 

The memetic operátor improves the individual / from 
the current population in this way: Given the archive of 
evaluated individuals A the weighted training set for in¬ 
dividual / in the current population is created as 


Ti = yi = -d{xi,P), 

1 1 

* 1 -f Xd{xi,I) 

where d{x, y) is the Euclidean distance of individuals 
X and y in the decision space, P is the set of non- 
dominated individuals in the archive and d{x, P) is the 
distance of individual x to the closest point in the set P. 
A is a parameter which Controls the locality of the mo¬ 


del, larger values of A lead to more local model, whereas 
lower values lead to more globál one. 

A meta-model is that trained using this training set. We 
ušed three different types of meta-models during the tes- 
ting: linear regression, support vector regression, and 
multilayer perceptrons. However, other types of models 
may be ušed, e.g. RBF networks are also a common cho- 
ice in this field. 

Finally, after the model is trained, a single-objective evo¬ 
lutionary algorithm with the meta-model as its íitness 
function is started. The initial population of this algori¬ 
thm is created by the perturbation of the values of the 
individual /. The individual I is also added to the initial 
population. This evolutionary algorithms seeks the local 
optima of the meta-model around the individual /. The 
best individual found is than returned to the population 
of the external multiobjective algorithm as the result of 
the memetic operátor. 

After each iteration of the external algorithm, the newly 
evaluated individuals are added to the archive of evalua¬ 
ted individuals and this archive is truncated, so it does 
not grow indefinitely, and does not use large amounts of 
memory. The truncation proceduře is very simple: ran- 
dom individuals from the archive are selected and re- 
moved to shrink the size of the archive under a specified 
limit. Although the proceduře is rather simple, it ensures 
that individuals from the more recent generations remain 
in the archive with higher probability than older indivi¬ 
duals. We also tried other truncation strategies (e.g. one 
similar to the selection proceduře in NSGA-II), but the 
random stratégy works significantly better. 

6. Experiments 

To compare the results we use a measure we call Hratio, 
it is deíined as the 

díreal 

J^ratio — T7 

optimal 

where Hreai is the hypervolume of the dominated space 
attained by the algorithm and HopUmai is the hypervo¬ 
lume of the reál Pareto set of the Solutions. As the Pareto 
set is known for all the ZDT problems, we can com- 
pute this number directly. We use the vector 2 = (2, 2) 
as the reference point in the hypervolume computation. 
All points that do not dominate the reference point are 
excluded from the hypervolume computation. 


PhD Conference ’ 11 


106 


ICS Prague 



Martin Pilát 


Multiobjective Memetic Algorithm... 


Parameter 

MOEA value 

Local search value 

Stopping criterion 

50,000 objective evaluations 

30 generations 

Population size 

50 

50 

Crossover operátor 

SBX 

SBX 

Crossover probability 

0.8 

0.8 

Mutation operátor 

Polynomial 

Polynomial 

Mutation probability 

0.1 

0.2 

Archive size 

400 

- 

Memetic operátor probability 

0.25 

- 

Meta-model locality parameter A 

- 

1 


Table 1: Parameters of the multiobjective algorithm 


We run tests on the well-known set of ZDT functions 
[16]. Although we tested various types of meta-models 
(námely linear regression, support vector regression, and 
multilayer perceptrons), we present only the results of li¬ 
near regression here*. The parameters of LAMM-MMA 
are presented in Table 1. We ušed NSGA-II as the exter- 
nal multiobjective evolutionary algorithm. 

In Table 2 we present the medián number of ob- 
jective function evaluations needed to attain the speci- 
fied Hratio over 20 runs for each of the configurations. 
NSGA-II is compared to ASM-MOMA and LAMM- 
MMA with linear regression as the meta-model. 

We can see that the use of a globál meta-model in ASM- 
MOMA generally greatly reduces the number of ob- 
jective function evaluations needed to attain the speci- 
fied Hratio- The local meta-models of LAMM-MMA 
reduce this number further by another almost 10%. Al¬ 
though the difference may seem smáli, it can translate to 
reductions of run-time and great reductions of costs in 
practical tasks. 

On ZDTl, ASM-MOMA reduced the number of eva¬ 
luations needed to attain the Hratio = 0.95 from more 
than 20,000 to 2,800. LAMM-MMA reduced this num¬ 
ber further to 2,600. This means 7.4 times lower number 
for ASM-MOMA and almost 8 times lower number for 
LAMM-MMA. For Hratio = 0.99 the reductions are 
not that large, however there is still reduction by almost 
40% in the number of evaluations. 

On ZDT2, the results for Hratio = 0.99 show re¬ 
ductions by the factor of 6.3 for ASM-MOMA and 7.2 
for LAMM-MMA. Again LAMM-MMA needed lower 
number of function evaluations than ASM-MOMA (by 
more than 10%). 


Hratio 

0.5 

0.75 

0.9 

0.95 

0.99 

ZDTl 

NSGA-II 

5600 

18600 

19850 

20750 

21850 

ASM-MOMA-LR 

1500 

2000 

2400 

2800 

12750 

LAMM-MMA-LR 

1300 

1750 

2250 

2600 

13100 

ZDT2 

NSGA-II 

650 

1650 

3550 

5050 

7900 

ASM-MOMA-LR 

350 

550 

750 

950 

1250 

LAMM-MMA-LR 

350 

450 

600 

850 

1100 

ZDT3 

NSGA-II 

600 

1250 

4150 

7250 

- 

ASM-MOMA-LR 

300 

500 

700 

800 

1150 

LAMM-MMA-LR 

300 

450 

650 

800 

1050 

ZDT6 

NSGA-II 

7950 

10200 

13950 

17700 

28650 

ASM-MOMA-LR 

2750 

5950 

11100 

15750 

30500 

LAMM-MMA-LR 

2850 

5850 

10550 

15350 

29200 


Table 2: Results of LAMM-MMA on the selected benchmark 
functions 

On ZDT3, the originál NSGA-II was not able to reach 
the Hratio = 0.99 (there was a limit of 50,000 objective 
function evaluations), whereas both ASM-MOMA and 
LAMM-MMA attained this value after 1,150 and 1,050 
function evaluation respectively. This would mean the 
reduction of more than 50 times. For the Hratio = 0.95 
both algorithms reached this value after 800 function 
evaluations, which is 9 times less than NSGA-II needed. 

ZDT6 is the hardest problém for our approach and there 
is no reduction in the number of function evaluations, 
when NSGA-II is ušed as the external evolutionary al¬ 
gorithm and linear regression is ušed as the meta-model. 
In fact, the results are even slightly worse than those of 
the originál NSGA-II. We háve seen some slight reducti¬ 
ons for different configurations, however, these were by 
far not as significant as those observed on other test pro- 
blems. 


^The rest (and more) of the results were presented at GECCO’ 11 [17] and ICIC’ 11 [18] conferences, as well as submitted to the Neurocomputing 
Journal [19]. 
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7. Conclusions 

We presented a multiobjective memetic algorithm with 
local meta-models. This algorithms helps to signifi- 
cantly reduce the number of function evaluations on 
most of the presented test problems. The comparison 
shows that local meta-models provide another 10% 
advantage over a single globál meta-model. Although 
the advantage may seem smáli it might háve great practi- 
cal consequences and may lead to huge savings when 
applied to objective functions which are expensive to 
evaluate. 

The disadvantage of the local meta-models compared to 
a single globál one is the need to train the model multiple 
times in each generation, which adds another overhead. 
This must be considered, when the algorithm is applied 
in practice. 

We also found a problém (ZDT6) where our approach 
did not Work well. This problém provides motivation for 
further research. Another open question is the effect of 
the locality parameter A on the convergence speed and 
the quality of obtained Solutions. 
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Abstract 

RCU is a synchronization mechanism that 
can increase concurrency in parallel algorithms, 
improving scalability in comparison to mutual 
exclusion. RCU provides asymmetric synchroni¬ 
zation of concurrent writers and readers sharing 
a data structure. Unlike mutual exclusion primi- 
tives, RCU can avoid expensive memory ope- 
rations on the most frequent code paths, boos- 
ting performance even on uniprocessors. Virtu- 
ally all contemporary RCU implementations run 
in the Linux kernel and strongly depend on its 
intemals. 

Our Work contributes a novel RCU algorithm 
based on easily portable foundations, not bound 
to any particular kernel architecture. We imple- 
mented and benchmarked our algorithm in the 
UTS kernel ušed by Solaris-based Systems. We 
compared our RCU algorithm to a readers-writer 
lock and to a portable, but feature-constrained 
RCU algorithm called QRCU. Our benchmarks 
suggest that the novel algorithm can outperform 
both readers-writer locks and QRCU on current 
SMP Systems. 

1. RCU Essentials 

Read-Copy-Update (RCU) is a means of communi- 
cation among three types of entities: readers, writers and 
reclaimers [1]. 

Readers access a shared data structure without modi- 
fying it and can run in parallel with other readers and 
writers, guaranteed to never block when entering or lea- 
ving their critical sections. Most RCU implementations 
do not require the readers to use atomic instructions or 
other expensive operations. 

Writers are a speciíic type of readers that can also mo- 
dify the shared data. Writers cooperate with the RCU 
mechanism to provide other readers with an illusion of 


data integrity, i.e. readers will not observe concurrent 
changes to the shared data during their critical sections. 
This is achieved by copying the shared data structure, 
making changes to the copy and finally replacing the 
pointer to the originál data structure with a pointer to 
the new one atomically. As long as readers adhere to 
certain data access rules, they always observe a consi- 
stent State of the data structure. RCU neither supports 
nor constrains concurrency among writers; they háve to 
synchronize their operations by means external to RCU. 

Deallocation of old versions of protected data has to 
be postponed, so that readers accessing them can finish 
their Work. The time needed for all potential readers to 
stop using the old data structure (no longer accessible to 
new emerging readers) is called a gmce period. A mo¬ 
ment when a potential reader does not access any data 
structure protected by RCU is called a quiescent statě. 
A grace period elapses when all potential readers go 
through at least one quiescent statě. Grace period de- 
tection is the key part of all RCU implementations. 

Reclaimers deallocate outdated data structures that had 
been made inaccessible to readers. It is necessary to wait 
for at least one grace period before the deallocation can 
be doně. Writers can use the RCU mechanism to block 
for at least one grace period, becoming reclaimers af- 
terwards. Alternatively, they can proceed immediately, 
asking the RCU mechanism to perform the deallocation 
when appropriate. Our novel RCU algorithm supports 
both of these options. 

2. The RCU Algorithm for UTS 

The cornerstone RCU algorithms in the Linux kernel are 
strongly bound to features speciíic to Linux, such as ti- 
mer interrupt handling on all processors. In the UTS ker¬ 
nel, timer interrupts are only handled by a subset of avai- 
lable processors, which may only include one processor 
on UMA machines [2], This fundamental difference ma- 
kes porting of the key Linux RCU algorithms to UTS or 
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other kernels technically infeasible. The design of our 
novel RCU algorithm strives to avoid technical depen- 
dencies related to one particular kernel. 

The key idea behind our algorithm can be illustrated 
on a “toy” RCU algorithm presented by Paul McKen- 
ney [3]: Writers context-switch themselves to each avai- 
lable processor before they become reclaimers. Since 
readers run with disabled preemption, the writers’ be- 
havior guarantees that at least one grace period must 
háve elapsed. Presumably, this algorithm is unusable in 
practice. First, its SMP scalability would be extremely 
poor. Second, it does not support non-blocking writers 
and delayed batched resource reclamation. 

Based on the principle mentioned above, we designed 
a more scalable algorithm where forced rescheduling is 
only ušed as the last resort when other means of grace 
period detection také too long to complete. Our novel al¬ 
gorithm differs from the trivial example above in a num- 
ber of ways. First, all grace period detection requests are 
batched and handled centrally by one detector thread, 
which avoids the need to reschedule each writer on each 
processor on each request. Second, the centrál detector 
thread avoids forced migration in most cases, at the cost 
of slightly higher overhead on the readers’ side. Third, 
most of the advanced RCU features, such as asynchro- 
nous reclamation, are implemented. 

A brief notě on notable characteristics of our algorithm 
follows. Readers do not use any expensive atomic in- 
structions. Readers only execute memory barriers when 
intensive grace period detection takés plače; they ne- 
ver do so in the absence of grace period requests. Na- 
turally occurring quiescent States (context switches, idle 
processors) are observed to reduce the grace period de¬ 
tection overhead even further. As long as all read-side 
critical sections také a bounded amount of time (which 
can be required and relied upon in a kernel environ- 
ment), grace period duration is also bounded. Asyn- 
chronous reclamation requests are handled in efficient 
batches by the same processor on which they were crea- 
ted, so that a warm cache can be exploited. 

3. Evaluation 

To verify that our RCU algorithm for the UTS kernel 
leads to performance improvements typical for well- 
known RCU implementations, we created a benchmar- 
king harness that performs a series of operations on 
a non-blocking hash table. This artiíicial workload si- 
mulates a kernel algorithm manipulating a data mapping 
under heavy stress. The same workload (sequences of 
hash table operations performed by multiple threads in 


parallel) has been benchmarked with four different syn- 
chronization mechanisms protecting the hash table. We 
ran our benchmark on a variety of SPARCv9 and x86-64 
SMP machines. 



511:1 

127:1 

31:1 

7:1 

1:1 

RCUc 

1 

1.04 

1.06 

1.12 

1.27 

RCUs 

1.03 

1.23 

1.48 

2.23 

5.24 

QRCU 

2.33 

2.33 

2.47 

3.06 

4.55 

DRCU 

2.86 

4.21 

8.95 

N/A 

N/A 


Table 1: Relative average running time 


Selected benchmark results (from an x86-64 machine 
with 8 processors) are shown in Table 1. Relative run¬ 
ning times of our multithreaded workload are displayed, 
normalized so that the shortest measured result takés one 
time unit. Columns represent ratios between frequencies 
of read-only and read/write operations on the hash table. 
Rows represent synchronization mechanisms. RCUs 
and RCUc denote our algorithm with its synchronous 
and asynchronous reclamation handling API, respecti- 
vely. QRCU denotes the feature-constrained RCU al¬ 
gorithm [4] ported for the saké of comparison. DRCU 
(“dummy RCU”) stands for an implementation of the 
RCU API using a plain readers-writer lock. 

Since RCU is designed for read-mostly workloads, 
significant improvements over DRCU under high rea- 
ders/writers ratios are not surprising. Interestingly, our 
novel algorithm performed relatively well even under 
low readers/writers ratios. 
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Abstrakt 

Těžiště článku spočívá ve srovnání dvou 
používaných komunikačních standardů ve zdra¬ 
votnictví: DASTA užívaná v Čechách, HL7 verze 
2 ve světě. První část je vedle úvodu věnována 
popisu základních principů a obsahu obou stan¬ 
dardů. Druhá část popisuje srovnávací meto¬ 
dologii, navrhuje zlepšení a hodnotí výsledek 
srovnání. Závěr je věnován úvaze o stavu užití 
standardů pro interoperabilitu systémů v českém 
zdravotnictví a výhledu do budoucna. 

1. Úvod 

Zdravotnictví je považováno za informačně 
nejnáročnější odvětví a bez důrazu na interoperabilitu 
(technickou, procesní, sémantickou) je výpočetní tech¬ 
nika pro lékaře stále spíše psacím strojem a překážkou 
než efektivním nástrojem. Lepší nástroje přitom posky¬ 
tují pohodlí, vyšší produktivitu, méně chyb, a zejména 
více času u pacienta. Informatizace léčebného procesu, 
interoperabilita mezi systémy a snaha o implementaci 
eHealth je výzvou našeho století a zároveň i možným 
lékem na neudržitelný demografický vývoj a rozpočtový 
schodek zdravotnictví v mnoha zemích. Problematika je 
diskutována jak v USA [1] [2], na úrovni Evropské ko¬ 
mise [3] [4] [5], tak v České republice [6]. 

Vedle dopadu interoperabilních systémů na každodenní 
práci lékařů shledávám důležitou roli také v ro¬ 
vině vědeckovýzkumně. Domnívám se, že už z pod¬ 
staty lěkařovy zkušenosti panuje mezi nemocnicemi 
podvědomá rivalita v dosahovaných výsledcích lěčby. 
Snaha o sebezlepšení automaticky indukuje experi¬ 
mentování s metodou lěčby, byf stále v mantine¬ 
lech uznávaných klinických postupů a pod rouškou 
lékařových mnohaletých zkušeností. Statistické vy¬ 
hodnocení by pak mělo uzavírat kruh procesu se- 
bezlepšování. Bohužel stávající klinické systémy ne¬ 


umožňují efektivní využití zaznamenaných údajů a tak 
se zamýšlená statistická pozorování prodražují a nebo 
se raději vůbec nerealizují. Zavedením strukturovaného 
zdravotního záznamu a principů interoperability kli¬ 
nických systémů lze zpřístupnit data vložená do těchto 
systémů i pro jiné úěely než opětovné čtení při další 
návštěvě pacienta. Možnost realizovat nízkonákladová 
statistická sledování vytvoří motivační potenciál pro se¬ 
bezlepšení, souměřitelnost a konkurenceschopnost jed¬ 
notlivých nemocnic. Zlevnění vědeckovýzkumných pro¬ 
jektů je nasnadě. Projekt Zlatokop v IKEM [7] je 
toho jasným důkazem. Prostředkem interoperability 
jsou standardy pro výměnu dat mezi systěmy. V česku 
vyvinutou Dastu srovnáme s mezinárodním standardem 
HL7 verze 2. 


2. DASTA 

DASTA je zkratkou pro DAtový STAndard a běžně 
se používá i oznaěení DS. Dasta byla z poěátku 
vyvíjena Českou společností zdravotnické informatiky 
a vědeckých informací Českě lěkařské spoleěnosti Jana 
Evangelisty Purkyně (ČSZIVI ČLS JEP) [8]. Dnes 
se však již uvádí, že Národní číselník laboratorních 
položek (NČLP), Datový standard, program ČLP pro 
práci s číselníky laboratorních položek a nástroje pro 
práci s DS a pro předávání dat mezi IS jsou autorským 
dílem rozsáhlěho kolektivu tvůrců z mnoha institucí, fa¬ 
kult, vědeckých ústavů a firem, celě dílo vzniklo za fi- 
nanění podpory Ministerstva zdravotnictví ČR [9]. 

2.1. Historie 

V roce 2003 byla vydána pracovní verze DS 03.00.01 
a verze NČLP 02.05.01 (v červnu 2003). K 1. listo¬ 
padu 2003 byl vydán finální tvar DS 03.01.01 společně s 
NČLP 02.06.01. Termín oficiálního vyhlášení platnosti 
těchto standardů byl od 1. ledna 2004 (prostřednictvím 
Věstníku MZ, částka 9, rok 2003). v dalších letech 
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2004 až 2006 bylo vydáno celkem 11 aktualizací DS3 
a NCLP. v roce 2006 byl vývoj DS3 ukončen s tím, že 
v roce 2007 budou udržovány pouze bloky NZIS pro 
ÚZIS. 

V prosinci roku 2006 byla na stránkách Ministerstva 
zdravotnictví České republiky uveřejněna verze 4 da¬ 
tového standardu, označovaná jako DS 04.01.01. Tento 
standard je závazný pro všechny uživatele Dasty od 
1. ledna 2007 [9]. Přestože zápis dat je již od verze DS 
02.01 realizován pomocí XML, DS 04.01.01 přináší re¬ 
voluční technologii XML Schéma. 

Pravděpodobné z důvodu existence závazku ČR 
v přístupové dohodě k EU o harmonizaci českých a Ev¬ 
ropských norem byl někdy v letech 2009-2011 aktuální 
text standardu přesunut z adresního prostoru minis¬ 
terstva zdravotnictví na stránky hlavního protagonisty 
- firmy Stapro (cílová doména přesměrování je cisel- 
niky.dasta.stapro.cz). Dalším důvodem může být prosté 
zjednodušení procesu aktualizace textu standardu, na 
druhou stranu konsenzuální statut Dasty tak určitě 
utrpěl. 


2.2. Datový soubor 

Účelem Dasty bylo standardizovat automatizovaný 
přenos dat o pacientovi a souvisejících údajích. Stan¬ 
dard ve všech verzích technicky představuje definici 
(tj. formát) datového souboru. Dasta definuje jednot¬ 
livé bloky (xml elementy) a jejich strukturu včetně 
vzájemněho vnořování. Datový soubor pak obsahuje 
vždy hlavní blok dasta, na který jsou navázány další 
datově bloky obsahující přenášeně informace. Datovým 
souborem je myšlen přímo soubor na disku, neboť název 
datověho souboru je přesně vymezen jednou z kombi¬ 
nací: 


”UTTXXXXX.KKK“ 

”UTTYYOOD.KKK“ 

”UTTXXXXX.xml“ 

”UTTYYOOD.xml“ 

”UTTXXXXX.VVS“ 


soubory komprimovaně 
soubory pro UZIS ČR 
soubory nekomprimované 
soubory pro UZIS ČR 
soubory nekomprimované 


Kde: 

U 

určuje typ urgentnosti: S-statim, 

TT 

R-rutina,T-technický nebo testovací, 
typ odesílajícího místa podle číselníku. 

KKK 

nástroj, kterým bylo zapakováno (arj/zip). 

XXXX 

řetězec sestavený z číslic a 

YY 

běžných písmen anglické abecedy, 
poslední dvojěíslí roku sledovaného ob¬ 

OO 

dobí, 

kód období podle číselníku období 

D 

(01=leden ... 12=prosinec, 4x=čtvrtletí), 
pořadové číslo dávky za sledované období. 

vv 

verze datové struktury. 

s 

typ šifrování (N - nešifrováno). 


Výsledná komunikace pak probíhá předáním da¬ 
tového souboru libovolnou (ale předem dohodnu¬ 
tou) elektronickou cestou - emailem, sdíleným dis¬ 
ko vým/paměfovým prostorem, FTP, dříve také na FDD. 

Dasta vůbec neřeší role komunikujících stran. Shodná 
datová struktura musí být použita v mnoha významech 
(vytvoření i smazání pacienta). Význam přenášené 
struktury je určen až v rámci každé instalace, předem 
zvoleným unikátním adresářem a dohodou (konfigurací) 
obou komunikujících stran. 

2.3. Datové bloky 

Datové bloky jsou základní strukturní entity datového 
souboru. Každý blok musí mít vždy své jméno, které 
je unikátní v rámci celého DS. Jméno datového bloku 
také určuje název XML elementu v datovém souboru. 
Popis každého datového bloku poskytuje informaci, 
které reálie se v bloku budou přenášet včetně urěení, 
zda jsou povinně nebo volitelně. Přestože definice da¬ 
tověho bloku je dostupná jak v DTD a ve vyšších 
verzích i v XML Schěma, popisná forma je definitoricky 
nadřazena. 

Specifikace Dasty obsahuje u každého bloku stručný po¬ 
pis a použití bloku. Dále specifikace obsahuje tabulku, 
kde se určují typy informací, které lze do datového bloku 
uložit. Znovupoužitelnost datových struktur leží plně 
na bedrech vývojářů, což se ne vždy podaří [10]. Jako 
příklad definice bloku uvedu popis hlavního datového 
bloku s názvem dasta (viz obrázek 1). 
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ULista 

Hlavní blok. Kořen grafu. Varianta pro DS4. 

V áže s e k c elému ode sílanému souboruvšechodesílatelůurč enému projednohopříjemce. 

Popis struktury bloku v obecném tvaruje k dispozicí v odkazu popis struktury bloků a souborů PS. 
Změny realizované od vydání DS3 jsou v seznamu změny v popisu struktury bloků a souborů DSS. 
Změny realizované od vydání DSějsou v seznamu změny v popisu struktury bloků a souborů DS'ě . 

{distribuováno od venae 4.01.01) 


kód 

T 

D 

V 

plný název 

hodnota 

podmínky, pokyny, 
poznámky 

změny 

id_soybor 


-40 

1 

jednoQi4Čn4 vnitřní 
identifí]c4ce souboni v rámci 
fírm/ 4 jejího progremu nebo 
infomucního systému 

text 

předepsené 

konstrijJcce 

pokyny: 

1. povirmý 

2. viz id soubor - Dokvnv 


v^rzejis 


8 

1 

verse d4tové stmktuiy 

fV DSl# 

pokyny: 

1. ve formátu xx.xx.xx, 
viz verse d4tového st4nd4r(iii 

3. viz verse ds - Dokvnv 
poznámky: 

1. n4příkl4d: ”03.02.01” 

2. viz verse ds - Doaiámkv 


ve«e_nclp 

a 

8 

1 

verse používeného NCLP 

rv NCLPl #! 

pokyny: 

1. ve formátu XX.XX.XX 

2. není-li NČLP vůbec 
využíván, z^lává se rrejni^í 
verse 2.00.00 

3. viz verse nclp - ookvnv 
poznámky: 

n4příkl4d: ^’02.07.01” 


bm_pnloha 

4 

1 

1 

binární d4tové bloky 

T,B 

vizseai4m 

hodnot 

pokyny: 
vizDiiloka 
poznámky: 
nejČ4stěji bude = T 


m 

4 

1 

1 

určení, typ přenášených d4t 
(v přípede p4cientských d4t 

R, s, u, v, 

B, C, H, T 

pokyny: 

1. viz blok is 4 též viz název 



Obrázek 1: Ukázka definice bloku “dasta”. 


Popis bloků využívá možností HTML a jednotlivé defi¬ 
nice a reference na číselníky jsou realizovány hypertex¬ 
tovým odkazem, což zrychluje dohledání potřebné in- 


formace. Sloupce mají následující význam: 

kód 

identifikátor pro potřeby XML, 

T 

XML typ: a - atribut, e - element, 
d - data. 

D 

délka položky. 

V 

výskyt/multiplicita (*, ?, -t-, počet). 

hodnota 

výčet hodnot,odkaz na tabulku nebo 
nevyplněno. 


Aktuální verze Dasta je připravena pojmout a přenést 
následující údaje: informace obálky datové zprávy 
(identifikace odesílatele, adresáta), informace o pacien¬ 
tovi (demografické údaje, údaje o platbě a pojišťovně, 
údaje pro NZIS, diagnózy, očkování, léky vydané, pra¬ 
covní neschopnosti, nestrukturovaná anamnéza), kli¬ 
nické události, výkaznictví do UZIS, laboratorní hod¬ 
noty, hodnoty hygieny a epidemiologie a vykázané 
výkony. 


Za poznámku stojí také novinka v DS 04 - firemní bloky. 
Firemním blokem je myšlen speciálně vyhrazený xml 
element s konkrétním jménem, jehož obsah však již dále 
není nijak specifikován. Bloky od různých výrobců in¬ 
formačních systémů se liší v použitém jmenném pro¬ 
storu (XML namespace). Podle vyjádření autorů by se 
obsah těchto bloků měl evolucí vyprofilovat do nej¬ 
vhodnější podoby, která se časem stane součástí stan¬ 
dardu Dasta. 

2.4. Číselníky 

Žádný standard se neobejde bez vlastních číselníků. 
Cílem každého standardu je formalizovat určitou oblast 
a právě číselníky jsou přímým nástrojem pro klasifikaci 
možných stavů popisovaných veličin. 

Číselníkem je podle definice Dasty uspořádaný soubor 
(obvykle uniformně dlouhých) hodnot. Ke každě hod¬ 
notě vždy přísluší krátký a dlouhý textový popis. Těchto 
tzv. jednoduchých číselníků Dasta obsahuje celkem 88. 
Jako příklad uvádím číselník NCMPATML (antimik- 
robiální látky): 
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Číseli»ilv NCIVIPAXML, 


Číselník 

ITC^/EPATNÍL 

Název 

Oiselnik antxmikrobiálnich látek. 

Zdi'oj 

■riselníky NČLP 

Aktualizace 

30 12.2006 0:16 40 

Kliř(e) 

klíc 

Počet vět 

104 

Sada 

200710 

Sada změna 

200710 

Verae NČI^ 

02 17.01 

Vei^ze I5S 

C4 01.01 

Platnost od 

1 1.2007 

Platnost do 



KUtc 

N32 

N55 

PORADÍ 

AMC 

amoxicUm klavulanát 

amoxicilm klavulanát 

001000 

AMF 

amfotencm B 

amfotencm B 

002000 

aj:^ 

aimkacm 

anokacm 

003000 

AMP 

amptcihn 

ampictiin 

004000 

AMS 

ampicihn sulbaktam 

ampicdm stilbaktaxn 

005000 

AMX 

amoxiciltn 

amo»cilin 

006000 

AZI 

aatromycm 

aatromycm 

007000 

AZL 

atdocilm 

azlocilm 

008000 

AZT 

aztreonam 

aztreonam 

009000 

BAC 

bacitracm 

bacitracin 

010000 

BIF 

bďonazol 

bifonazol 

011000 

CDR 

cefadrojol 

cefadroxil 

012000 

CFC 

cefakloj 

cefakJor 

013000 

CIP 

cíprofloxacm 

cíprofloxacm 

014000 

CLA 

klantromycin 

klantromycm 

015000 

CLI 

kJmdamycm 

klmdamycm 

016000 

CLT 

cefalotm 

cefalotm 

017000 


Obrázek 2: Ukázka definice číselníku “antimikrobiální látky”. 


Mimo tyto číselníky používané přímo Dastou jsou na 
stránkách [9] i číselníky používané při výkazech do 
NZIS. Těchto číselníků je cca 348. Zcela odděleně je 
pak číselník NCLP, který se svojí rozsáhlostí a kom¬ 
plexitou vyrovnává světově uznávaným nomenklaturám 
LOINCnebo ICDIO. 

3. HL7 Verze 2 

Komunikační standard HL7 verze 2 začal vznikat v roce 
1987. Zcela poplatně době byla primárně řešena potřeba 
výměny dat, tedy zejména forma zápisu a struktura 
dat. Jako logické řešení se nabízelo definování da¬ 
tových zpráv formou textového dokumentu. Každá 
zpráva měla být uložena v samostatném souboru, každý 
řádek obsahuje samostatný segment (typ je určen třemi 
písmeny na začátku řádku). Každý segment obsahuje 
položky (fields), které jsou navzájem odděleny znakem 
„ I “. Každá položka je určeného datového typu. Da¬ 
tový typ předurčuje počet, pořadí a význam kompo¬ 
nent obvykle oddělených znakem „A“. HL7 zprávy ne¬ 
vycházejí z žádného referenčního datového modelu, re¬ 
lativně velké znovupoužitelnosti kódu je však dosaženo 
opakováním stejného segmentu v mnoha zprávách 
a užitím komplexnějších datových typů vytvořených 
speciálně pro výměnu informací ve zdravotnictví (ČSN 
ISO 21090:2011). Příklad HL7 zprávy označené jako 
OKU ^ mi [11]. 

MSHI ~~\í I IGAOOI IVAERS PROCES|20010331 I IORU‘R01... 
pídi i i 1234 SR'123412 LR"00725 MR| |Doe"J... 


NKl 111 Jones ''Jane "Lee’' 'RN | VAB'Vaccine administere . . . 
NKl I 2 I Jones''Jane'‘Lee'''RN I FVP''Form completed by ... 
ORC I CN I I I I I I I I I I 11234567''Welby'Marcus''J'Jr^Dr . . . 
OBŘI lili"CDC VAERS-1 (FDA) Report| | |20010316| 


První tři znaky každého řádku (segmentu) označují iden¬ 
tifikátor segmentu, v tomto případě značí: Hlavička 
zprávy (MSH), identifikace pacienta (PID), přidružené 
osoby k pacientovi (Next of Kin - NKl), obecné 
informace o objednávce (Order Commons - ORC), 
požadavek na pozorování (Observation Request - 
OBR). Celý standard HL7 verze 2 je rozdělen do 
následujících kapitol: 

• 1: Introduction (Úvod) 

• 2: Control (Řízení toku informací) 

• 2A: Control - Data Types (Datové typy) 

• 2B: Control - Conformance (Kompatibilita) 

• 3: Patient Administration (Administrace paienta) 

• 4: Order Entry (Laboratorní žádanky) 

• 5: Query (Dotazy, vyhledávání) 

• 6: Financial Management (Agenda samoplátců) 

• 7: Observation Reporting (Zprávy o měřených 
hodnotách) 

• 8: Master Files (Číselníkový server) 

• 9: Medical Records/Information Mgmt. (Řízení 
toku dokumentů) 

• 10: Scheduling (Plánování a objednávky) 

• 11: Patient Referral (Žádanka o vyšetření) 
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• 12: Patient Care (Sdílená péče o pacienta) 

• 13: Clinical Laboratory Automation (Laboratorní 
výsledky) 

• 14: Application Management (Řízení aplikací) 

• 15: Personnel Management (Personalistika) 

• 16: eClaims (Výkaznictví pojišfovně) 

• 17: Materials Management (Skladové hos¬ 
podářství) 


Od třetí kapitoly každá kapitola obsahuje přes 100 stran 
definic zpráv (povinností a opakování segmentů a polí, 
tabulkových výčtů hodnot) a ke každému poli je ex¬ 
plicitně uveden význam. Takto (zdlouhavou) specifikací 
obsahu zpráv je suplována absence referenčního modelu 
a vzniká tak nepříjemná možnost „přiohnout“ význam 
segmentu v konkrétní zprávě. Ukázka definice čtvrtého 
pole ze segmentu OBR je na obrázku 3. 


4.5.3.4 OBR-4 Universal Service Identifier (CWE) 00238 

Components: <IdGntifier (ST)> <TGXt (ST)> <NamG of Coding System (ID)> <AltGrnatG Identifier (ST)> ^ 

<Alternate Text (ST)> ^ <Name of Alternate Coding System (ID)> <Coding System Version ID (ST)> '' 
<Alternate Coding System Version ID (ST)> ^ <Original Text (ST)> 

Defmition: This field contains the identifier code íbr the requested obser\’ation/'test/batter\'. This can be 
based on local and or "universal" codes. We reconnnend the "universal" proceduře identifier. The structure 
of this CE data Upe is described in the control section. 


Obrázek 3: Ukázka definice 4. položky v segmentu OBR. 


HL7 verze 2 určuje role komunikujících stran po¬ 
mocí popisu spouštěcí události (tzv. Trigger Evenť), 
která zapříčinila vznik zprávy. Identifikátor spouštěcí 
události nalezneme v záhlaví zprávy (segment MSH) na 
deváté pozici, v druhé komponentě (v našem příkladu 
tedy ROl). Několik spouštěcích událostí může vyvolat 
shodný typ zprávy, obdobně jako u Dasty, kde je stejná 
struktura souboru použita k více účelům. Způsob použití 
standardu HL7 v2 je ale díky výčtu spouštěcích událostí 
daleko více predikovatelný, než v případě Dasta, kde 
význam datověho souboru vzniká až dohodou mezi ko¬ 
munikujícími stranami a samotná Dasta toto neřeší. 

4. Iniciativa IHE 

Integrating the Healthcare Enterprise (IHE) je ce¬ 
losvětová iniciativa zdravotnických profesionálů, 
výrobců software a poskytovatelů pěče [13] [14] [15]. 
Vzhledem k tomu, že řešení konkrétního úkolu přenosu 
dat (např. sdílení admin. údajů o pacientovi, předávání 
RTG snímků apod.) dnes může být realizováno několika 
způsoby a přesto podle existujících standardů, IHE se 
snaží o zakotvení doporuěených postupů realizace inte- 
roperability systémů v praxi. Přílišná volnost výrobců 
software při volbě způsobu komunikace vede k nekom¬ 
patibilním technologickým řešením, byf podle exis¬ 
tujících standardů. 

Vzhledem k tomu, že se IHE soustřeďuje na konkrétní 
realizace komunikace, je možné pro tyto úlohy otesto¬ 
vat kompatibilitu různých produktů. IHE pořádá setkání 
výrobců software pod názvem Connectathon, kde se 
výrobci snaží dopilovat komunikaci s jinými produkty 


a pak obstát v certifikaci IHE. Výsledky těchto setkání 
jsou dostupné v databázi kompatibilních produktů, 
takže se zdravotnické zařízení dodržující doporuěení 
IHE může předem ujistit, zda-li se zamýšlený nákup 
konkrétního software neprodraží v rámci začlenění do 
podnikové architektury. 

IHE zveřejňuje svoje specifikace v tzv. profilech. Název 
profil vychází z terminologie použitých komunikaěních 
standardů, kde se jakékoliv zpřesnění nad rámec obecné 
specifikace nazývá lokálním profilem. IHE profily defi¬ 
nují konkrétní úlohy sdílení dat, určují vhodný standard 
a dále popisují způsob užití standardu. 

Nej názornějším příkladem je asi úkol synchronizace 
času v rámci zdravotnického zařízení. Přestože mnoho 
čtenářů bez okolku navrhne protokol NTP [16], v praxi 
se objevuje vedle synchronizace na úrovni pracovní 
stanice také synchronizace času na úrovni klientské 
aplikace pracující nad společnou databází MySQL 
v režimu client-server. IHE profil Consistent Time Inte- 
gration [17] proto doporučuje jednotný postup. Pokud 
je požadavek na jednotný čas, použít NTP v případě, 
kdy je centrální časový server k dispozici, v ostatních 
případech použít SNTP [18] (kterě je podporováno i v 
NTPd). 


4.1. IHE PAM Profil 

Abychom mohli věrohodně porovnat standardy DASTA 
a HL7 verze 2 ve stejné situaci, musíme ke stan¬ 
dardu HL7 přibrat ještě specifikaci použití (IHE profil) 
v případech, ve kterých se běžně Dasta používá. 
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Patient Administration Management (PAM) je profil 
z domény “IHE IT Infrastructure”. Profil popisuje, jak 
mají aplikace mezi sebou sdílet demografické údaje o 
pacientech za použití HL7 verze 2. Profil navrhuje dva 
topologicky odlišné způsoby administrace: centrální re¬ 
gistr nebo architekturu rovnocenných lokálních registrů. 

5. Srovnám DASTA a HL7 

Objektivní srovnání dostupných komunikačních stan¬ 
dardů je důležité nejen pro budoucí podporu rozvoje 
DASTA ze strany MZ ČR, ale zejména pro sméřování 
vývoje českých nemocničních informačních systémů. 
Prvním krokem při zavádění interoperability je zajištění 
přenosu pacientských ůdajů mezi systémy, protože ve¬ 
dle vlastních informací o pacientech se tak distribuuje i 
jednotná identifikace pacienta, nevznikají duplicity a za¬ 
mezí se pozdějšímu ručnímu slučování záznamů. Proto 
jsem se společně s kolegy z EuroMISE centra zaměřil 
na srovnání DASTA a HL7 verze 2 (IHE PAM profil) 
v úloze přenosu pacientských dat. 

Pro srovnání jsem využil “Framework” publikovaný 
předsedou sdružení HL7 Finsko (Juhoa Mykkánen) 
[19]. Tento hodnotící systěm obsahuje celkem 9 for¬ 
mulářů. Každý formulář se zaměřuje na specifickě těma 
definice standardu: 

• Form. 1: Základní informace a úěel standardu, 

• Form. 2: Obsah a sěmantika, 

• Form. 3: Funkcionalita a interakce, 

• Form. 4: Aplikační infrastruktura, 

• Form. 5: Technická aspekty, 

• Form. 6: Flexibilita standardu, 

• Form. 7: Vyspělost, použitelnost, oficiální statut, 

• Form. 8: Životní cyklus systému/aplikací, 

• Form. 9: Specifické možnosti rozšíření. 


Framework vyžaduje vyhodnocení formulářů v pořadí 
1, 9, 2, 3, 4, 5, 6, 7 a 8, přiěemž formuláře 1 a 9 mo¬ 
hou být zcela diskvalifikující - používají se jako hrubé 
síto. Každý formulář obsahuje několik otázek, které by 
měly být vzaty v úvahu před implementací konkrětního 
úkolu. Každá otázka se nejprve hodnotí z hlediska 
důležitosti otázky (wi) na stupnici 0-3 (0: není důležitým 
faktorem, 1: žádoucí, 2: velmi žádoucí, 3: povinné). 
Následné je na otázku odpovězeno z pohledu hodno- 
ceněho standardu j na stupnici -3 až -f3 (-3: standard 
odporuje požadavku, 0: standard nespecifikuje, -i-l: stan¬ 
dard může požadavek podporovat za pomoci rozšíření, 
+2\ požadavek je částečně podporován, -1-3: požadavek 
je plně podporován), hodnotu oznaěme Sý. Po vyhodno¬ 
cení celěho formuláře můžeme vypočítat celkově skóre 
standardu j za tento formulář: 


SCj Sij) (1) 

čím vyšší skóre standard obdrží, tím vhodnější by 
měl být. Jak jsem v průběhu hodnocení zjistil, ma¬ 
ximální dosažitelné skóre z různých formulářů je závislé 
na počtu otázek v jednotlivých formulářích. V cel¬ 
kovém skóre pak rozsahem větší formuláře neoprávněně 
nabývají na závažnosti. Proto jsem navrhnul opravu 
váhy otázky Wý počtem otázek ve formuláři, w[ = 
Wi/na tedy i celkové skóre standardu: sc' = scjjn. 

5.1. Výsledek 

Provedl jsem hodnocení standardu DASTA a HL7 v2 
v IHE PAM Profilu pro účely výměny pacientských dat 
mezi systémy. Dosažené hodnocení v jednotlivých for¬ 
mulářích vč. oprav na poěet otázek formuláře je uvedeno 
v tabulce 1, nejvyšší hodnota v každém sloupci je tučně 
zvýrazněna. 


Formulář 

Poč. otázek 


DASTA 

SCl SCj 

IHE PAM 

SC2 SC2 

1: Základní informace a účel standardu 

30 

2,9 

58 

1,9 

126 

4,2 

2: Obsah a sémantika 

29 

1,8 

116 

4,0 

135 

4,7 

3: Funkcionalita a interakce 

27 

1,5 

41 

1,5 

96 

3,6 

4: Aplikační infrastruktura 

14 

1,4 

16 

1,1 

53 

3,8 

5: Technické aspekty 

12 

0,8 

17 

1,4 

12 

1,0 

6: Flexibilita standardu 

4 

1,5 

5 

1,3 

17 

4,3 

7: Vyspělost, použitelnost, oficiální statut 

9 

0,9 

17 

1,9 

20 

2,2 

8: Životní cyklus systému/aplikací 

9 

1,2 

16 

1,8 

15 

1,7 

9: Specifickě možnosti rozšíření 

17 

0,9 

22 

1,3 

42 

2,5 

Celkem: 

151 


308 

16,2 

516 

27,8 


Tabulka 1: Výsledek hodnocení v jednotlivých formulářích. 
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5.2. Diskuse 

Cbceme-li zhodnotit významnost navržené opravy, 
musíme srovnat počty otázek ve formulářích a součet 
hodnocení relevantnosti otázek v jednotlivých for¬ 
mulářích. Zatímco první tři nejrozsáhlejší formuláře č. 
1, 2, 3 obdržely i nej vyšší součty relevantnosti, for¬ 
mulář č. 6 (Flexibilita) se posunul k významnějším, 
naopak formulář č. 9 (Specifické možnosti rozšíření) 
na významnosti ztratil. Změna v pořadí odpovídá rea¬ 
litě pro náš případ, neboť jsem hodnotil standardy pro 
zcela konkrétní případ a tak je důležitá flexibilita, ni¬ 
koliv nějaké možnosti rozšíření, která pi implementaci 
spíše překážejí. Tomu odpovídá i hodnocení standardu 
HL7 ve flexibilitě, kdy sc 2 = 17 byla třetí nejmenší 
hodnota, ale po opravě na poěet otázek se jedná o druhý 
největší příspěvek (sc 2 = 4.3) do celkověho skóre. 

Srovnáme-li poměr celkověho skóre, vidíme kolikrát je 
DASTA lepší než HL7 v2. Zároveň vidíme, že v našem 
případě oprava neměla velký vliv na výkonnost stan¬ 
dardů v hodnocení. 

sci/sc 2 = 0,60 ; scj/sc2 = 0,58 (2) 

Malý vliv opravy na výkonnost standardu je 
pravděpodobně způsoben aplikací hodnotícího rámce 
na případ velmi podobný tomu, za jakým byl rámec 
vytvářen. Matematicky se tento fakt odráží ve velmi po- 
dobněm pořadí formulářů řazeno podle počtu otázek i 
podle normovaná relevantnosti. 

6. Závěr 

Hodnocení prokázalo, že Dasta nedosahuje kvalit stan¬ 
dardu HL7 v2 ani v tě nejběžnější situaci předávání pa¬ 
cientských dat. 

Po informaění stránce jsou standardy prakticky nesrov¬ 
natelně. HL7 verze 2 v každá kapitole na úvod popíše 
očekávané situace a nastíní problémy k řešení. Čtenář 
tak dostane nejen představu o myšlenkových pochodech 
autorů kapitoly, ale zároveň srovnáním s českou praxí 
velmi rychle odhalí další (zatím nevyužité) možnosti 
v oboru. Dále jsou specifikovány jednotlivé spouštěcí 
události reálného světa. Každá spouštěcí událost má 
svůj unikátní kód (např. AOl, S04) a pokud nastane, 
zapříčiní přenos konkrětního typu zprávy s definova¬ 
nou strukturou segmentů. Protože každá kapitola je po¬ 
pisována specifická oblasti, jsou zde definovány i nově 
segmenty, která obsahují potřebná pole pro přenos dat. 
Velmi rychle čtenář získává představu o tom, co může 
být přenášeno, za jakých podmínek, v jakých situacích a 
jaká je souslednost zpráv. 


Oproti tomu DASTA se vždy omezuje na popis struk¬ 
tury dat, přičemž povinnost, násobnost, resp. nepoužití 
konkrětního údaje nebo bloku vyplývá z logiky řešeněho 
případu. Stejně tak spouštěcí událost (a tedy i způsob 
zpracování zprávy) musí přijímající strana dovodit, 
případně musí být explicitně dohodnuto mezi komu¬ 
nikujícími stranami. Pozorný čtenář zde možná uvidí 
příměr k výměně zpráv o změnách stavu v protikladu 
se zasíláním dokumentů konstatujících finální stav. A 
oprávněně. V ěeském prostředí všeobjímajících mo¬ 
nolitických nemocničních informačních systěmů (NIS) 
možná ani neexistuje potřeba rozesílat změny stavů, 
neboť NIS sám zajistí potřebnou funkcionalitu. Pro 
uživatele NISu je pak dostatečné exportovat pouze 
finální stav do sousedního systému. 

Srovnání standardů bylo publikováno ve sborníku kon¬ 
ference EFMI STC 2011 a prezentováno kolegou 
Nagym ve Slovinském Laško jako součást výzkumného 
projektu CBI ([20]). 

6.1. Výhled do budoucna 

Zastávám názor, že česká laická, lékařská i informa- 
tická veřejnost je ve vztahu k medicínské informatice 
a eHealth nedostatečně informována. Příčinu spatřuji 
v časté zrněné na postu Ministra zdravotnictví, v absenci 
dlouhodobě sledovaně strategie implementace eHealth 
na MZČR, v personálním podstavu odboru informatiky 
MZČR a v technologických limitech dnes používaných 
monolitických informačních systěmů, důkazem budiž 
změřená výkonnost Dasty. Mezi další příčiny patří 
všeobecná česká vlastnost ignorování zahraničních tech¬ 
nologických trendů, nedostatečná participace v me¬ 
zinárodních standardizačních institucích a faktická ne¬ 
funkčnost sdružení eHealthForum. Z všeobecné nezna¬ 
losti a s ohledem na existenci několika odstrašujících 
pokusů pak plyne nezájem nebo dokonce averze lékařů 
k inovacím, neochota dodavatelů investovat do inova- 
tivních technologií eHealth, ignorování technologických 
výzev ze strany VZP, tápání MZČR v implementaci 
eHealth a všeobecný technologický věhlas IZIP | EZK 
založený spíše na lobbismu, než na dosažených techno¬ 
logických metách. 

Domnívám se, že zavádění eHealth v ČR bude probíhat 
postupně a v tempu úměrněmu schopnostem nabídky 
a poptávky po nových technologiích. Důkazem je 
současná antipatie lékařů k eHealth ruku v ruce s 
malou angažovaností výrobců zdravotnického software 
(viz každý seminář ČNFeH). Na straně poptávky musí 
nemocnice, lěkaři i sestry rozpoznat přínosy eHealth 
a dospět k ochotě investovat do inovací. Na straně 
nabídky musí stát dostatečně tecbnologickě znalosti a 
hmatatelná konkurence, aby inovace nebyly pouhým 
předraženým reklamním kabátkem. Technologický roz- 
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voj tuzemských dodavatelů informačních systémů je 
úměrný dostupnému kapitálu, agresivitě konkureněního 
prostředí a dostupností praktických zkušeností s tech¬ 
nologiemi. Trh s informaěními systémy je momentálně 
díky monolitickým softwarovým řešením, důsledkem 
výhodněho data-lockinu a díky akvizicím v minulých le¬ 
tech prakticky nehybný a uzavřený v ěeské kotlině. Situ¬ 
aci žel nepřispívá ani z pohledu medicínské informatiky 
diletantský přístup firmy IZIP. 
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Abstract 

In this páper we study an identiflcation 
of culprit and assesment of evidence against 
him. We define a simple model called the island 
problém and we derive the weight-of-evidence 
formula in its basic form. We find how we 
can deal with uncertainty about basic parame- 
tres of model, like size of population. We in- 
vestigate possibility of inclusion of relatedness 
and subpopulation structure into model through 
beta-binomial formula, we enlarge DNA mixtu- 
res of DNA and at the close we present brief 
OverView about DNA databases. 

1. Introduction 

Technological progress that allows the use of DNA has 
caused a revolution in criminology. It helps convict the 
perpetrators of those crimes that once appeared irre- 
solvable and also helps prove the innocence who háve 
already been convicted. DNA analysis is now accepted 
by the broad public as a completely standard proceduře, 
which reliably convicts the offender. Here, however, hi- 
des one of the main problems that results from using 
DNA, for even DNA evidence is not foolproof. 

Several possibilities keep DNA from being completely 
reliable: for example there may be a falše location of the 
trace (more specifically, the offender may háve discar- 
ded a cigarette butt which had previously been smoked 
by someone else); the wrong také of biological samples 
or damage to the samples could háve occurred; or there 
may háve been secondary transfer of biological materiál. 
However, mathematicians do not deal with any of these 
things. Rather, they are faced with the following task: if 
all of the above options are excluded, what is the pro¬ 
bability that a particular offender is a detained person, 


given that the perpetrator’s DNA and the DNA of the 
suspect are available? 

In forensic practice, genetic profiles consisting of the 
short tandem repeat (STR) polymorphisms are currently 
ušed. The number of polymorphisms varies from coun- 
try to country, with the smallest being seven ušed in Ger- 
many and a maximum of sixteen ušed in the Czech Re- 
public. The probability of correct identification depends 
on the number of comparisons of polymorphisms (or 
loci where studied polymorphisms lie) and their gene¬ 
tic variability. The more we investigate loci and the gre- 
ater the variability between individual loci, the smaller 
the probability that the other person will háve the same 
configuration (and therefore the same genetic profile). 
Due to the quality of biological materiál and its amount 
it is not always possible to investigate all of the poly¬ 
morphisms and very often genetic profiles contain fewer 
loci than is necessary to uniquely identify them. 

In the following text we will assume that we examine 
only one locus. Assuming independence of loci, gene- 
ralization to a larger number of loci can be performed 
using product rule (i.e. multiplying the individual mar- 
ginal probabilities). 

2. Formalization 

Denotation 

• E - evidence or information about the crime 
(i.e. the circumstances, witness testimonies, crime 
scene evidence, etc.) 

• G - an event at which the suspect is guilty 

• I - an event at which the suspect is innocent 

• Ci - an event at which the culprit is a person i 

• X - the population of alternativě suspects. 
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Our goal is to determine the conditional probability of 
P{G\E) that, given circumstances E, the suspect is truly 
the culprit of the investigated crime. According to Bayes 
theorem 


P{G\E) 


P{E\G)P{G) 

P{E\G)P{G) + P{E\I)P{I)' 


( 1 ) 


However, the expression P{E\I) cannot be counted di- 
rectly. The suspect is innocent if and only if there exists 
an index i 6 X in which the event Ci occurs. Then the 
event I is equivalent to the event Ci and thanks to 
the disjunction of events Ci holds: 


Thus 


P(/) = P(U,exC,) = ^P(C7,). 

iex 


P{E\I)P{I) 


Uiei Ci) P (Uigi Ci) = 


Y^P(EnC,) = 

iei 


^p(i5ia)p(a). 

iei 


Define likelihood ratio 

_ P{E\C.) 
* P{E\G) 


( 2 ) 


which expresses how many times the probability of evi¬ 
dence E is greater under the condition that the culprit is 
a person i than under the condition that the culprit is the 
suspect. Further define likelihood weights 


Wi 


P(a) 

P(G) 


which expresses how many times the prior probability 
of committing a crime by a person i is greater than the 
prior probability of committing a crime by the suspect. 


Then 


P{G\E) 


1 

1 + "Yhiei ^iEi 


(3) 


The formula (3) is usually called the weight-of- 
evidence formula. 


3. The island prohlem 


The simplest application of the previous part is the "is¬ 
land problém”. This is a model where a crime is com- 
mitted on an inaccessible island which contains N peo- 
ple who are unrelated to each other. At the beginning. 


there is no information about the offender, so we as- 
sign to each of the islanders the same (prior) probability 
of committing a crime. Then the offender is found to 
possess a certain characteristic T and the suspect is also 
found to háve that characteristic, T. The question beco- 
mes, to what extent can we be sure that we háve found 
the suspect who is truly the culprit? 


Using the formula (3) we get 


^{G\E) 


1 

l + N-p ’ 


(4) 


where p is the probability of the T. For example if 
p = 0.01 and N = 100 then P(G|í;) = 1/2. 


The previous result can be modified for more complex 
(and realistic) situations. Leťs see where our simple mo¬ 
del can fail: 


• Typing and handling errors 

As the test may give erroneous results in a smáli 
percentage of cases, errors caused hy human fac- 
tor must also be considered: contamination or 
replacement of a sample from which the T-status 
is investigated; incorrect evaluation of the results, 
or even intentional misrepresentation. 

• The population size 

Often the population size N is only estimated and 
furthermore, if there is migration in the popu¬ 
lation, then it is necessary to account for greater 
uncertainty within the population size. 

• The probability of occurrence T in the population 
The value of p is usually unknown and is therefore 
estimated on the basis of relative frequency of the 
T in a smaller sample or in a similar population, 
about which we háve more information. However, 
these auxiliary data may be outdated or may only 
partially describe the ivestigated population. 

• Suspect searching 

The suspect is not usually chosen randomly from 
the population but on the basis of other circum- 
stantial evidence which increase the probability of 
guilt. Another possibility is choose the suspect by 
testing persons from the population for the pre¬ 
sence of T. In this way, people who are not T- 
bearers can be excluded and thus the population 
size of alternativě suspects is reduced. 

• Relatives and population subdivision 

If the suspect (or other person being tested) is a 
T-bearer and some of his relatives are included in 
the population too, then in the čase of DNA profile 
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increases the probability of other persons having 
T dne to inheritance. Similarly, unusually high re- 
lative frequency of a rare character usually occurs 
within the same subpopulation dne to its shared 
evolution history. 


and if we put s = 0.25 with N = 100 then P(G) is 
greater than 1/(A'' + 1) by only 0.000000485. 

Leťs see what uncertainty in population size causes by 
using formula (4): 


• The same prior probability of committing a crime 
Although this requirement intuitively corresponds 
with the generál presumption of innocence, we 
can asses varying prior probability (i.e. based on 
the distance from the scene, time availability, or a 
possible alibi). 

We will analýze some of these cases in detail in the 
following sections. 

4. Uncertainty abont popnlation size 


The uncertainty in population size of possible alterna¬ 
tivě suspects affects the prior probability, P(G). Con- 
sider the population size W as a random variable with 
mean N. Prior probability of guilt, conditional on value 
Ň, is 

P(G|iV) = l/(iV-f 1). 

However, since N is not known, we use the expectation: 


r 1 

1 

G\Ň -E 


L -1 

.Ň + l_ 


The function 1 / {Ň -f 1) is not symmetric, but is convex 
on the interval (0, oo). Therefore Jensen’s inequality for 
convex functions (E[/(a:)] > /(E[a;])) implies 

1 

- ivTi 

because E[iV] = N. 


P(G) = E 


1 


N + 1 


P{G\E) = 


1 +W R 

P(G) 


1 + Pp(G) ^ 



, jV(JV-H)(jV-H2) .,_ 

Ar2^_27v+2e JV{iV-H)(Ař-|-2) ^ 

1 


JV^-H2Af+2g 


1 I Y 

J- ^ P N^+2N^+2Ne 

1 

1 + Np ]^3j^2m+2Ne^ 


Again substituting e = 0.25 and N = 100 we conclude 
that P(G|f7) = 0.5000124 which, despite the high va¬ 
lue of e, differs from the originál result of 50 %, which 
was calculated with a fixed N, by just one thousandth of 
a percent. Therefore, continuing with uncertainty about 
N, 


P{G\E) 


1 

1 + Np{l-2e/N‘^) 


is very good approximation to také. In this example the 
approximation gives P(G|iJ) = 0.5000125, which is 


50.00125 %. 


Balding [1] uses an approximation order of worse mag- 
nitude 


Thus the uncertainty of the value N tends to favor the P(G\E) ~ _ - _ 

defendant. This effect is usually very smáli. Let it be 1 -f Np (1 — 4e/N^) 

shown in a concrete example. 

which gives our example the value P(G|i?) = 

For e 6 (0,0.5) we put 0.5000003, or 50.00003 %. 


{ N — 1 with probability e 
N with probability 1 — 2£ 
N + l with probability e. 


Then 

P(G) 


Iw-bl 

1 

N + 1 


e 1 — 2s e 
_ “ iV iVTT N + 2 ~ 
2e ^ 1 

N{N +1){N + 2) - N+1 


5. Relatives and popnlation structure 


Alleles, which are identical and come from a com- 
mon ancestor, are called identical by descent (ibd). 
The commonality of recent evolution history between 
two persons, whether relatives or members of the same 
subpopulation, increases the probability of ibd alleles 
occurrence. Therefore, the coancestry coefficient 6, indi- 
cating the probability that two randomly selected alleles 
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on fixed locus are ibd, is ušed as the measure of rela- 
tedness within subpopulations. Neglecting the influence 
of kinship and population structure leads to an overesti- 
mation of posterior probability of the suspecťs guilt, and 
therefore ignoring this infiuence tends to cause disfavor 
for defendant. Thus, this topič is given considerable at- 
tention. 

Consider a given locus with J alleles Ai,..., Aj 
whose probability of occurrence in the population is 
pi,..., pj, Pí = 1- Allele proportions in the sub- 
population can be modeled by the Dirichlet distribution 
( [5]) with parameters Xpi, X = where 6 is the 

coancestry coefficient characterizing the subpopulation 
and k is the proportion of the subpopulation within the 
generál population. Thus the probability of drawing rrii 
alleles Ai mi = n) is given by 


Suppose first that the culprit has a homozygous profile 
AjAj. Then calculate the probability that the suspect has 
the same homozygous profile: 

R, = P{G,=A,A,\Gs = A,A,)^P{A]\A]) = 
= P{A,\A^^)-P{A,\Aj) 


We know to calculate these conditional probabilities 
using (7). First we put mj = n = 2 and then mj = 
n = 3. Therefore 


^ [(i-e)p, + 2g(i-fc)] 
[l-é» + 2é»(l-fc)] 
l{l- 9 )Pj + 39 {l-k)] 
[l -9 + 39 {l-k)] 


( 8 ) 


Similarly, we proceed for culprit with a heterozygous 
profile Aj Ak : 


P(mi, .. .,mj) 


r (A) TT r {Xpi + mi) 
r(A + n) Aj r (ApO 


Putting m = (mi,..., mj) we can adjust formula (5) 
to 


= P{G, = AjAk\Gs=A,Au) = 

= P{AjAk\AjAk) = 

= P{A,\A]Al)P{A,\A]Al) + 

+ P(A,\A]Al)P(Ai^\A]Al). (9) 


.7 rrij — l 

n n - 9 ) pj+ 9 i {1 - k)] 

P(m) = -. (6) 

n [l - 9 A 9 i{l - k)] 

i=0 


The formula (6) is usually called the beta-binomial 
sampling formula and applies to ordered samples. If we 
want to use unordered samples, it is necessary to mul- 
tiply the result by . 

From the formula (6) we can also deduce the probability 
of certain allele withdrawal by using our knowledge of 
previous allele’s withdrawal: 

P{mj + l\mi ,..., mj,..., mj) = 

^ (1 - 9)pj+mj9{l - k) 

\ — 9 + n9 {\ — k) 


5.1. Aplicatíon of beta-binomial formula 


To quantify both expressions on the bottom line we put 
mj = 1, n = 2 and mj, = 1, n = 3; mk = 1, n = 2 and 
mj = 1, n = 3 respectively. In total 


R^ = 2 


[{i-e)p, + 9{\-k)] 
[i-e + 2 e{i-k)] 

[(1 - d)pk + e(i - k)] 
[l -9 + 39 {l-k)] 


( 10 ) 


6. DNA mixtures 


If the DNA sample is found to háve more than two alle¬ 
les at one locus, then it is defined as a mixture. The num- 
ber of contributors to the mixture can be known or esti- 
mated, usually as where n is the maximum number 
of alleles detected. Due to the large number of situations 
which may arise we show for illustration only the čase 
in which the victim (V) and one other person contribute 
to the mixture. 


Denote Gc and Gs as culprit and suspect genotypes, 
respectively, and denote Gi as the genotype of a generál 
person i. Then the likelihood ratio (2) can be rewritten 
as 


Ri 


P (Gc = Gs = D\C^) 
P{Gc = Gs = D\G) 


P (G, = Gs = D) 
P (Gs = D) 


P(Gi 


D\Gs = D). 


Thus the likelihood ratio Ri, defined by formula (2), can 
be rewritten as 

^ PiEc,Gs,Gv\C,) ^ 

* P(Ec,Gs,Gv\G) 

P{Ec\Gs,Gv,C,) PiGs,Gv\C,) 
PiEc\Gs,Gv,G) ■ PiGs,Gv\G) 

^ P(Ec\Gs,Gv,G,) _ PiEc\Gv,G,) 

P{Ec\Gs,Gv,G) PiEc\Gs,Gv,G)-^ ’ 
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6.1. Four alleles mixture 

First we look at the čase where the mixture consists of 
four alleles. 

Suppose the following conditions apply: 

1. None of the persons are considered relatives to 
each other. 

2. The population is homogeneous (i.e. 6 = 0). 

3. The population follows Hardy-Weinberg equilib- 
rium. 

Let the mixture be made up of alleles A, B, C, and D, 
with known probabilities of occurrence in the total po¬ 
pulation PatPb,PCi and Pd- Also let the suspect háve 
alleles A and B and let the victim háve alleles C and 
D. Then the denominator in the formula (11) is equal to 
one, the numerator is equal to the probability of obser- 
ving the person with alleles A and B (which using the 
Information above assumes the probability of occurance 
2paPb), and therefore, the likelihood ratio is 

Rt = ‘^PaPb- 

Suppose now that all considered persons háve the same 
degree of relatedness to each other as expressed by the 
coancestry coefficient 9. Then according to (7) 

R, = P{AB\ABCD) = 

2 [(1 - e)pA + 9{1- k)] [(1 - 9)pb + 9(1 - k)] 
[l-9 + 49{l-k)][l-9 + 59{l-k)\ 

6.2. Three alleles mixture 

In the čase of three alleles in the sample it is necessary 
to assume at least two contributors to the mixture. Con- 
sider alleles A, B, and C with probabilities pa,Pb, and 
Pc- If the victim is homozygous for allele C, we get the 
same results as in the four allele’s mixture. 

Assume that the victim is heterozygous with alleles A 
and B and that the suspect is homozygous for allele C. 
Furthermore, assume that conditions 1 to 3 are fulfilled. 
Then the denominator of the formula (11) is again equal 
to one, the numerator is equal to the probability of ob- 
serving a person who has the allele C and does not háve 
a different allele other than A, B, or C, and 

= P{AC) + P{BC) + P{CC) = 

= 2pAPc + ‘2pBPc +Pc ■ ( 12 ) 


To include the population structure we use the formula 
(7) again: 

R^ = P {AC\ABCC) + P {BC\ABCC) + 

+ PiCC\ABCC) = 

^ 2[{1 - 9) PA + 9 jl-k)] [{l-9)pc + 29{l-k)] 
[l-9 + A9(l-k)\[\-9 + 50 (1 - k)\ 

2[{l - 9) pB + 9 il-k)][{l-9) PC+ 29 jl-k)] 
[l-9 + 4:9{l-k)][l-9 + 59{l-k)] 
l{l-9)pc + 39{l-k)][{l-9)pc + 29{l-k)] 
[1-0 + 40(1-A:)] [1-0 +50(1-/c)] 
[{1-9) PC+ 29 jl-k)] 

[l-0 + 40(l-fc)] 

^ J(1 - 0) {2pA + 2pB + pc) + 70 (1 - k)] 

^ [1 - 0 + 50 (1 - fc)] 

We assumed in the previous calculation that the suspect 
is homozygous with alleles C. If he is heterozygote with 
alleles A and C, or B and C respectively, formula (12) 
remains unchanged under conditions 1 to 3. If popu¬ 
lation structure is included the likelihood ratio is 

^ [(1 - 0) PC + 0(1 - k)] 

[1-9 + 49(1- k)] 

[(1 - 0) (2pA + 2pB +pc) + 89(l- fc)] 

^ [l-0 + 50(l-fc)] 

7. DNA database 

DNA profiles, as sequences of alphanumeric data, 
allows relatively easy storage in the database. Therefore 
national databases began being created in the latě 1990’s 
and háve continued to function since then. Currently 
there are three major forensic DNA databases: the Com- 
bined DNA Indexing System (CODIS), which is main- 
tained by the United States FBI; the European Network 
of Forensic Science Institutes (ENFSI) DNA database; 
and the Interpol Standard Set of Loci (ISSOL) database 
maintained by Interpol. 

All of these systems divide DNA database into two sub- 
databases. In the crime scene database the biological 
samples which are collected at the scene are stored and 
in the convicted offender database genetic profiles of 
persons convicted in the past are stored. These two data¬ 
bases are compared with one another and eventual agre- 
ement of profiles is examined by qualified professionals. 

The type of offenses for which DNA is stored differs 
among countries and States. Initially, these databases 
contained only samples from violent offenders, such as 
those convicted of aggravated assault, rape, or murder. 
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However, the valné of obtaining DNA from offenders of 
less severe crimes has been recognized more in recent ti- 
mes, as it has been discovered that many smáli time cri- 
minals often become repeat offenders, and in some cases 
more violent future offenders. However, the power of a 
large bank of DNA samples can sometimes serve as a 
deterrent. A match of DNA evidence from a crime scene 
(which would then be logged in the crime scene data- 
base) to one in the convicted offender database rapidly 
solves the crime rapidly and efficiently, saving time, ef- 
fort, and money. Conversely, the use of DNA evidence 
can also immediately prove a suspecťs innocence ([6]). 

According to data from the United States in August of 
2006, the crime scene database included approximately 
150 000 profiles and the convicted offender database 
more than 3 500 000 profiles ( [2]). The national data¬ 
base of United Kingdom currently consists of over four 
million profiles, and increases monthly by fořty to fifty 
thousand. The success of this approach has been con- 
firmed by the increase in the number of solved crimes 
from twenty-four to forty-three percent within the Uni¬ 
ted Kingdom, since the creation of the DNA databases. 

Therefore, the database systém has the support of pub¬ 
lic. From a negative standpoint, the DNA often reveals 


very sensitive, personál Information and therefore it is 
necessary that databases are kept confidential and are 
thoroughly protected from abuse. 

The Czech national database was created in 2002. After 
rapid development, the database now contains approxi¬ 
mately ninety thousand genetic profiles. 
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Abstract 

Covariance Matrix Adaptation (CMA) is a 
stochastic optimization algorithm working in ge- 
nerations with populations of Solutions. It has 
become the state-of-the-art in Evolution Strate¬ 
gies (ES) in solving multidimensional, multimo- 
dal and noisy optimization problems. The páper 
introduces the subject of ES, details the originál 
version of CMA-ES. Its convenient advantages 
are summarized, consisting mainly in the invari¬ 
ance properties, as well as disadvantages, com- 
posed of tuning five parameters and no time im- 
provement in working with larger populations. 

We give a survey of recent methods that deal 
with the limitations. Also, we describe CMA- 
ES based algorithm for multi-objective optimi¬ 
zation. 

1. Introduction 

Evolution Strategies (ES), a subfield of Evolutionary 
Algorithms (EA), are stochastic numerical optimization 
methods for solving optimization problems, i.e. finding 
globál optimum of an objective function / : i?" —> R, 
this function is denoted fitness function in EA terms. 
ESs put few assumptions on the objective function in 
comparison to classical optimization techniques, e.g. 
the function needs not be smooth, nor it makes any 
assumption about the convexity and linearity of the 
function. The ES works iteratively, one iteration is called 
a generation. In one generation, a set (called popu- 
lation) of candidate Solutions to the optimization pro¬ 
blém exists. 

The population of Solutions is sorted according to their 
fitness values. ESs employ rank-based selection using 
several schemes [1]. A (A, /r) scheme samples A new in- 
dividuals (offspring) in every generation from fx parents, 
/i best offspring are selected as parents for the next ge¬ 


neration and no current parent are passed to the next ge¬ 
neration. scheme also samples A offspring and 

selects /i best individuals as parents, however, it selects 
them from the union of parents and offspring. If an in- 
dividual can be put into new generation without being 
mutated, then the algorithm is called elitist, thus (A, fi) 
scheme is not elitist in contrast to the (X + fi) one. 

Sampling of new individuals (offspring) is doně by me- 
ans of a mutation operátor, that usually adds a Gaussian 
random noise to a parent candidate solution. Eormally, 
given a random variable of candidate solution X 6 i?", 
a realization in generation g and is the current 
favourite solution, the following equations are equiva- 
lent 


.(9) -bíjA/'(0,C(9)) 

(1) 

.(9) +J\í{0,a^C^3)) 

(2) 

.(9) -bíTV^C(9)Af(0,I), 

(3) 


where Aí{0, C) represents a realization of multivariate 
normál distribution with zero mean and symmetric po¬ 
sitive definite covariance matrix C, which describe pair- 
wise dependencies between one dimensional variables 
of the search space. This distribution is called the search 
distribution. cr is a positive valné called the step size. 
ESs differ in construction of I stands for identity 
matrix. 

Adaptation of search parameters, C and a, to the opti¬ 
mization problém is the subject of ES algorithms de- 
scribed below. In this páper we give a description of 
a state-of-the-art ES algorithm, the Covariance Matrix 
Adaptation Evolution Stratégy (CMA-ES), with its pro¬ 
perties (Section 2). The subsequent sections deal with 
modifications that enhance the algorithm in a speci- 
fic way. Section 3 summarizes methods that reduce the 
number of generations needed to reach the optimum. In 
Section 4 it is argued that the covariance matrix should 
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be proportional to the inverse of the Hessian matrix of 
the fitness function. Section 5 introduces a simplified 
CMA-ES algorithm that requires only two parameters 
to be tuned. Section 6 reviews CMA-ES algorithms for 
multi-objective optimization problems. 


2. Originál CMA-ES algorithm 

CMA-ES employs two basic principles of the parameter 
adaptation of the search distribution. The first, referred 
to as derandomization, consist in deterministic increase 
of probability of successful, i.e. better than its parent, 
candidate Solutions and search steps by maximum like- 
lihood method, which is doně by updating the mean of 
the distribution so that the likelihood of previously suc¬ 
cessful candidate Solutions is maximized. Also the cova¬ 
riance matrix is updated at every generation so that the 
likelihood of previously taken steps is also maximized. 

Second, two sequences of the successive steps are re- 
corded, they are called evolution paths and are expres- 
sed as a sum of consecutive steps. This summation is 
referred to as cumulation. One path is concerned with 
covariance matrix adaptation and the other with globál 
step size adaptation. 

The (/i, A)-CMA-ES [2] generates A offspring by 

4®^'^ = (^)m ^ = 1, • • •, A, (4) 

~ A/'(0,C(9))and 

(x)4 = E ^^4®) (5) 

corresponds to mís) in (1), it is the weighted mean of 
the fi best individuals in generation g, is the set 
of indices of selected individuals of generation g, with 
~ sums to one over the best /i individuals. 

A common selection is the ordinary mean Wi = l//i. 

The positive definite covariance matrix can 

be decomposed (B(®)D(®E = 

B(®) (B(®E, which is eigenvalue decomposi- 

tion of C(®\ where B^®) is an orthogonal matrix whose 
columns are normalized eigenvectors of C^®) and D^®) 
is the diagonál matrix of square roots of eigenvalues. 
ThusA/'(0,C(®)) = B(®)D(®)A/'(0,I). 

As already mentioned, C^®) is adapted by means of evo¬ 
lution path 

p(®+i) = (1 _ c,) . pO) + • CwB(®)d(®) (z)(f+'\ 


where 


0 “ = \/c,(2-c,),c. = 

V Er=i wf 

Cc determines the cumulation time for pc, which is rou- 
ghly 1/cc, c“ normalizes the variance of Pc, as (1 — 
Cc)^ + (Cc)^ = 1. Cw is chosen so that under random 
selection Cw (z)|f''~^^ and z^^^ háve the same variance 
and are identically distributed. Details of derivation can 
be found in [2]. Einally, the covariance matrix is adapted 

with rank-one matrix pi®^^^ > 


C(®+1) = (l-Ccov)-C(®)+Cc, 


•P, 


(9+1) /"„(9+1) 


,(9+1) ( 

'p(®+l) 

)"■ 




(6) 

1 


C(9) 

was 


Notě also, that E 

shown in [3]. This explains the usage of (1 — Ccov) in 
conjunction with Ccov in (6). 

The globál step size cr^®^ is adapted via another evolu¬ 
tion path pS®^^^ where scaling with D^®^ is omitted, 

p(®+i) = (1 - c,,) • p(®) + clí • CwB(®) (7) 


^(9+1) — ijig) . gxp 



„(®+i) _ ý 
Fct A.n 


( 8 ) 


where = E [||Af(0,I)||] = ^ŽE /E (f) isthe 
expectation of length of a A/’(0,1) distributed random 
vector. 


Altogether, the so called stratégy parameters 
Cc, Ccov, Ca,da ueed to be tuned, apart from the A and jj,. 


Algorithm 1 (/i, A)-CMA-ES 
1. input Cc, Ccov, Co, dfj 

2: initialize cr = 1 ,Pct = Pc = O, C = I, 6 
Af(0,I) 

3: repeat 

4: for i = 1,..., A do 

5: Xi ^ (x) -f cr(®)A/'(0,C(®)) 

6 : fi = fitness{xi) 

1 : end for 

8: sort according to fitness 

9: update mean (5) 

10: update evolution paths (6) and (7) 

11: update covariance matrix (6) 

12: update (T (8) 

13: until termination criterion is met 


g=l g=2 g^3 
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Figuře 1: Illustration of CMA-ES work: Adaptation of cova¬ 
riance matrix (dashed ellipse) with sampled indi- 
viduals across generations. The distribution shape 
can adapt to the landscape of the optimization pro¬ 
blém. 

Fig. 1 depicts possible time development of candidate 
Solutions and the covariance matrix they were sampled 
with. 


The properties of CMA-ES were concisely described 
in [4]. We will give a description of the majority of them. 

Stationarity: The update equations satisfy unbia- 

sedness of variations of the search and stratégy para- 
meters. Denote Under neutrál selection 

with X, ~ A/" C(®) j , we íind that 

E [ln(T(®V^®^] =lníj(3^ 

Notě, that the unbiasedness of In a does not imply un- 
biasedness for a. Actually, E > cr^®). A 

bias toward increase or decrease can cause divergence 
or premature convergence, respectively, in cases when 
the selection pressure is low. However, for noisy data a 
controlled increase in the bias can be advantageous. 


3. Time complexity reduction of CMA-ES 

3.1. Adding higher rank information 

In CMA-ES, the covariance matrix is updated in every 
generation with the outer product of the evolution path 
Pc, which is a symmetrical n x n matrix of rank one. 
Hansen et al. [5] argues that information contained in 
larger populations can be exploited by adding a higher 
rank information with term 



(9) 


where = B(®)D(®\ Equation (6) is modified to 
C(®+1) = (1 - Ccov) • C(®) + Ccov • U(®+1), (10) 

where 

U(®+i) = acovpi®+^) (p(®+i))^ + (l-acov)-Z(®+i), 

( 11 ) 

Ocov is the tuning parameter, 0 < acov < 1- Other 
equations remain unchanged. Decreasing Ocov results 
in greater weight on the new higher rank information 
and lower weight on the originál rank one information, 
while increasing Ocov puts the weights vice-versa. It 
has been shown in [5] that E [Zi®+^i] = CÍ®i, clearly 
E [Ci®"*"^!] = Cl®l, therefore the coefficient (1 — Ocov) 
is ušed in conjunction with acov in (10) and (11). 


Invariance: Invariance properties of an optimization 
algorithm imply uniform performance on a class of fit- 
ness functions, which allows to generalize and predict 
future performance of the algorithm on different sets of 
data. Generally, translation invariance is required in any 
mathematical optimization algorithm CMA-ES. CMA- 
ES further exhibits the invariance under the strictly mo¬ 
notonie transformations of the fitness values, the algo¬ 
rithm depends only on the sorted order of the fitness 
function values. Also, invariance under rotation and re- 
flection of the search space is preserved. 

In practice, the drawback is the tuning of four parame- 
ters, the selection of which was empirically studied in 
the originál article resulting in ad-hoc rules, but no theo- 
retical studies were conducted to support them. Another 
drawback of the approach is slow convergence and no 
time improvement for the čase of larger populations. In 
the next sections, we deseribe algorithms that try to the 
drawbacks and a CMA-ES algorithm for multi-objective 
optimization is introduced. 


3.2. Efficient covariance matrix decomposition 

Igel et al. [6] propose (l-i-l)-Cholesky-CMA-ES that 
replaces the computationally expensive eigenvalue de¬ 
composition of the covariance matrix with Cholesky de¬ 
composition running the rank-one update directly on the 
Cholesky factors. Given a symmetric positive definite 
matrix C Cholesky decomposition puts 

C = AA"^, (12) 


with A the lower triangular matrix with strictly posi¬ 
tive diagonál entries. Assume = A(®)z(®), z(®l ~ 
A/’(0,1) in combination with (12) and (6), the Cholesky 
factor A(®1 can be shown to be equal to 


A(®+il=CaA(®4 




(l-c2)||z(®)| 



p^®)z*-®) 


(13) 


with Ca = \/l “ Ccov. for details of derivation and sug- 
gested parameter setting see [6]. 
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The evolution path Pc is not employed here, since its up¬ 
date would stali whenever the offspring is not success- 
ful, i.e. its fitness is worse than its parent. This would 
cause divergence of a if the Pc were long. Therefore, the 
cumulative step size adaptation is replaced by a success 
rule based step size control [7]. Consider psucc = 
denotes the success rate, which is the proportion of in- 
dividuals Asucc that háve better fitness than its parent. 
Then, the so called average success rate ps smooths the 
step size (7 of a generál (1 + A)-CMA-ES by 

p(9+i) = (1 - Cc)v^f> + CctPsucc (14) 


As+i) 


. S I 1 — -ntarget 

= a(3)exp 

a 


1 _ ri^arget 

-L i^SUCC 


(15) 


where the target success probability, is a new 

stratégy parameter. The update implements the heuris- 
tics that the step size should be increased if the success 
rate is high and decreased if the success rate is low. The 
rule is reflected in the (15), for p^ > PsScf the argu¬ 
ment of the exponential is greater than zero resulting in 
increase of a. Forp^ = Psucl***' the argument is zero and 
no change in a takés plače. Einally, if p^ < p*“f*, the 
argument is lower than zero which results in a decrease 
of a. 


The (l-Hl)-Cholesky-CMA-ES inherits all invariance 
properties from the originál version. It reads 


where 


otherwise 


^step 


„(9+1) _ „ 

■^parent 

( 7 ( 5 + 1 ) 


( 5 ) 

parent 


p(9+l) =(1 _ c,)p(9) (18) 

C( 9 +l) =(1 - Ccov)C( 9 ) 

+ Ccov + Cc(2 - Cc)C(9>^ 

(19) 


Algorithm 3 (1+1)-CMA-ES 
input: Psucr,Ptre.h,Cc, Ccov 

initialize: = 1 ,Pct = Pc = 0,C = 

I, (x)^6Af(0,I) 

repeat 

^Ipring = (ff)AA(O, Cit')) 

Update evolution path (14) and (7(15) 
if fitness (Xoffspring) fitllCSS(Xparent) thcil 

^parent — ^offspring 
Psucc ^ Ptresh thcil 

update ( 7 ( 9 + 1 ) i,y (ig^^ ( 1 - 7 ^ 

else 

update ( 7 ( 9 + 1 ) iiy (ig^^ ( 1 ^^ 

end if 
end if 

until termination criterion not met 


Algorithm 2 (l+l)-Cholesky-CMA-ES 
input: Ccov 

initialize ~ ^(0,1), cr = 1, A = I 

repeat 

xiipcing = + aít') A(3)V(0,1) 

Update Pg (14) and ťj (15) 

if fitness(Xoffspring) ^ fitllCSS(Xparent) thCIl 
c^(5+l) ^ ( 5 ) 

‘^parent '^offspring 

update A(9+t) by (13) 

end if 

until termination criterion is met 


The difference to (1+1)-CMA-ES, see e.g. [ 8 ], is the ab- 
sent Cholesky factor and its respective updates, instead 
the update of C is employed. It introduces a new stra¬ 
tégy parameter ptresh- It uses the evolution path Pc, it 
depends on averaged success rate p^. If p^ < ptresh> the 
update goes 


If Psucc is high, above the threshold ptresh> the update of 
the evolution path is pc is stalled. This prevents excessi- 
vely fast update of covariance matrix when the step size 
is smáli. If Psccc is below the ptresh, the pc is updated 
only by exponential smoothing. In this čase, the second 
summand in the update of pc is missing (in comparison 
with the hrst čase), which is compensated by the term 
Cc(2 - Cc) in (19). 


Suttorp et al. [9] extends the Cholesky-CMA-ES by in- 
troducing the inverse of A(s(, which allows the transfor- 
mationziťi = AÍ®i for 


^(9+1) +A^®i — 

Ca 


where 


1 


s(9)||2 


( 20 ) 



p(g+l) = (1 - Cc)p+ + v/cc(2 - Cc)Xstep, (16) 
C(a+1) = (1 - Ccov)C(9) + CcovPÍ''+^+i®+^^", (17) 


This improves the time complexity of one generation 
from 0{'nř) to 0(p?). 
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4. Approximating the Hessian of íitness function 

Auger [10] argues for the idea that the originál CMA- 
ES does not make an optimal use of previously sampled 
points, when updating the covariance matrix. A better 
use of the points lies in learning the local curvature of 
the fitness function. The rationale behind this is shown 
on the optimization of sphere functions and then on el- 
liptic functions. 

Consider first the sphere functions, where the objective 
is to minimize the function /s(x) = x"'’x. Numerical 
experiments [11] showed that in ES, the optimal covari¬ 
ance matrix is the identity matrix, which was supported 
by theoretical studies that put forward dynamical step 
size [12]. 

Eor elliptic functions, /e(x) = ^x^Hx, with positive 
definite symmetric matrix H, the optimization problém 
is solved by variable change that will turn the problém 
into the sphere one. The matrix H can be decomposed 
using eigenvalue decomposition H = PA^P^ with 
orthonormal matrix P with the eigenvectors of H as co- 
lumns and diagonál matrix A of square roots of eigenva- 
lues of H. If we let W = A“^PX, then it is easy to see 
that /e(W) = /s(X) and that the mutation operátor (1) 
with C = transforms Wq into Wo-|-crA/’(0,1). 

If we consider I is the best choice for the covariance 
matrix for the sphere problém, then is the best 

choice for the covariance matrix for the elliptic problém. 

Given that the gradient and the Hessian of the objective 
function exist, they can be locally approximated by Ta- 
ylor series of second order resulting in elliptic equation. 
Suppose we want to approximate the Hessian matrix in 
point xo, we háve a set of N points iíj,j...,N in the 
vicinity of xq their íitness values, the gradient V and the 
Hessian matrix H can be found by solving linear least 
squares problém 

/(Xí) - /(xo) - (Xi - Xo)'^V 

1 T f 

--(Xi-Xo) H(xj-xo) 

( 21 ) 

The unknowns are V (d elements) and H (d{d -|- l)/2 
elements). If we háve more than d{d + 3)/2 sam- 
ple points, the overdetermined linear systém of equati- 
ons corresponding to (21) can be solved by means of 
pseudo-inversion with the cost of 0{d^). Notě that for 
elliptic functions the least square value reaches 0, thus 
V and H can be determined exactly. For non-elliptic 
functions, the minimum is non-zero. Therefore, a metric 


for determining the quality of approximation was deve- 
loped in [10]. 

The proposed algorithm is (1, A)-LS-CMA-ES, a is up- 
dated as in the originál version. The Hessian matrix is 
calculated every riupj iterations. If the approximation is 
sufficient, then the covariance matrix is updated by 

C(9+i) = j , (22) 

where is the Hessian matrix in generation g. In 
the next nupd generations, this matrix is ušed without 
updates and only the step size gets updated. If the ap¬ 
proximation is poor, the update switches to the mode of 
standard CMA-ES for the next riupd generations. 


5. Simplífied CMA-ES 

The article [13] proposes a radical simplification of the 
covariance learning rule and the cr-self-adaptation ap- 
proach. The new algorithm is called Covariance Mat¬ 
rix Simpliíied Adaptation Evolution Stratégy (CMSA- 
ES). Both the evolution paths with exponential smoo- 
thing are not considered now. For each individual in the 
population, a mutation strength is generated by log- 
normal rule 

[tA/’( 0, 1)], i = 1,..., a, (23) 

with the mean of and a correlated random 

direction vector is generated 

=Af(0,C(s(), (24) 

resulting in an offspring 

x(®+^) = (x)^3) -b (25) 

The matrix s(9+^( is formed from row vectors 
The covariance matrix is updated by 

C(S+1) = (1 - Ccov) . 

(26) 

Here, only two parameters (ccov, t) needs to be tuned 
for which the authors provide hints. The empirical re- 
sults showed that for large population sizes, originál 
CMA-ES is outperformed by CMSA-ES in terms of 
number of generations needed to converge to (near) op¬ 
timal solution. 


min 2 
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6. CMA-ES for multi-objective optimization 


Multi-objective optimization (MO) is concerned with 
optimizing several, often conflicting, criteria. The ar- 
ticle [8] studies the usage of CMA-ES in MO preserving 
invariance properties of CMA-ES. 

In MO, m fitness functions /i, • • •, /m , forming an ob- 
jective vector f{x) = (/i(x),..., /^(x))'^, are mini- 
mized. Given Solutions x, x' 6 J?" exist, we say x do- 
minates x', written as x x' iff Vz € {1, • • • ,m} : 
/í(x) < /í(x') and 3i 6 {1 ,..., m} : /i(x) < /i(x'). 
The elements of a (Pareto) set = {x|x 6 Jí" A^x' 6 
ií" : x' x} are not dominated by any element and are 
called Pareto-optimal. The Pareto set Qx forms a Pareto 
front /(x),x 6 Qx- Given no additional information, 
no Pareto-optimal solution can be said to be superior to 
any other. The goal of MO is to find a diversified Pareto 
set. 

MO-CMA-ES ranks the individuals based on the level 
of non-dominance, which is inspired by the NSGA-II 
algorithm [14]. To rank individuals on the same level, 
two additional criteria were developed: the hrst is the 
crowding-distance, which provides an estimate of the 
density surrounding the non-dominated solution. It gives 
higher rank to individuals contributing more to the diver- 
sity of the objective vector. The second is the method of 
contributing hypervolume, which ranks best those indi¬ 
viduals that contribute most to the hypervolume of the 
Pareto front. Both the secondary criteria cause that the 
resulting is not invariant to order-preserving transfor- 
mation of fitness function. 

The algorithm for multiobjective optimization is re- 
ferred to as Amo x (l-tl)-MO-CMA-ES. It contains a po- 
pulation of Amo elitist (l-i-l)-CMA-ES, described previ- 
ously in Algorithm 3. In every generation, each indivi- 
dual k,k = 1,..., Xmo generates one offspring. The 
step size and covariance matrix of each offspring and its 
parent are updated according to the (l4-l)-CMA-ES. AU 
the parents and offspring are put in a set which 
ranks them and selects Amo individuals as parents for 
the next generation. Ranking, line 14 in the subsequence 
algorithm, is based on non-dominace in the first plače, 
and crowding-distance or contributing hypervolume in 
the second plače. 


Algorithm 4 Amo X(l-i-l)-MO-CMA-ES 


input: 

■) Ccov ■) Cc 5 

intialize: Xpar,k ~ A/'(0, 1), Cfc = I, pc = 0 

repeat 

for fc = 1,..., Amo do 

4nd,fc ) 

/ind.fc = fitneSSÍX;^®^ fc) 

/par.fc = fitness(x|f2ent.fc) 

end for 

Jl(g) = {/ind.fc,/par,fc|l < ^ < Amo} 

for fc = 1,.. ., Amo do 

iindafe rr nf x^^\ . iisinp (]4^ and tT^'1 


12: update (7 of fc using (14) and (15) 

13: end for 

14: if /ind,fc ^ /par,fc thcn 

15l ^parent — ^offspring 

16: ^^Psucc ^ Ptresh thcil 

17: update by (16), (17) 

18: else 

19: update by (18), (19) 

20: end if 

21: endif 

22: select Amo individuals as parents for the next ge¬ 

neration 

23: until termination criterion not met 


7. Summary 


This Work surveyed Covariance Matrix Adaptation, the 
state-of-the-art stochastic optimization method in Evo¬ 
lution Strategies. The algorithm was described with a 
discussion of its pros, above all the invariance proper¬ 
ties, and cons, particularly a number of stratégy para- 
meters and almost no time improvement for large popu- 
lation. Several modifications were briefly described. 

The CMSA-ES reduces the number of tuning parame- 
ters. Time complexity was reduced by modifications 
based on less time consuming decomposition of cova¬ 
riance matrix. LS-CMA-ES updates the covariance mat¬ 
rix by approximating the Hessian of the fitness function. 
Finally, a variant for multi-objective optimization was 
introduced. 

The next work lies in a wider study of the CMA-ES 
based algorithms and enhancing the algorithm with new 
properties. Currently, we search to incorporate the co- 
pula approach [15]. Also we study the response surface 
methodology [16] to be applied in CMA-ES. 
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Abstract 

In dynamic classifier aggregation, the fuzzy 
integrál is ušed often as an aggregation operátor. 

As the fuzzy measure of the integrál, Sugeno A- 
measure (which belongs to a more generál class 
of _L-decomposable fuzzy measures) is ušed 
most often. However, there is usuatly no explicit 
reason why this particular measure is ušed, and 
moreover, the measure cannot model the simila- 
rities of the individual ctassifiers in the team. In 
this páper, we show that _L-decomposable me¬ 
asures are not appropriate for classifier combi- 
ning, and we introduce the Interaction-Sensitive 
Fuzzy Measure (ISFM), designed specifically 
for classifier combining. The experiments with 
3 different classifier Systems on 26 benchmark 
datasets show that ISFM outperforms the Su¬ 
geno A-measure in most cases. 

1. Introduction 

This páper is an exlension of [1], in which we inlroduced 
the Interaction-Sensitive Fuzzy Measure. In this páper, 
we discuss the ISFM in more detail and perform more 
experiments. 

Classifier combining methods are a popular tool for im- 
proving the quality of classification. Instead of using just 
one classifier, a team of classifiers is created, and the 
predictions of the team are comhined into a single pre- 
diction [2-A]. There are two main approaches to classi¬ 
fier combining: classifier selection (where a single clas¬ 
sifier from the team is selected for prediction according 
to some criterion) and classifier aggregation (where the 
outputs of the classifiers are aggregated into a single pre¬ 
diction). Classifier combination can be either static, i.e.. 


the combining process is the same for all patterns, or dy¬ 
namic, where the combination process is adapted to the 
currently classified pattern [5-9]. 

One of the popular aggregation operators is Ůie fuzzy in¬ 
tegrál [2,10-12]. The fuzzy integrál aggregates the out¬ 
puts of the individual classifiers in the team with respect 
to a fuzzy measure, representing the classification confi- 
dences. Fuzzy measure is a generalization of the additive 
probabilistic measure, where the additivity is replaced 
by a weaker condition, monotonicity - this gives us a 
tool which can model interactions between different ele- 
ments of the fuzzy measure space. However, due to the 
lack of additivity, the fuzzy measure needs to be defined 
on all subsets of the fuzzy measure space, resulting in 
2 ’’ defining values for finite cases, where r is the size of 
the universe. There are several approaches to overcome 
this weakness: symmetric fuzzy measures [10], for which 
the value of the measure depends only on the num- 
ber of elements in the argument, and F-decomposable 
fuzzy measures, including Sugeno X-measure [10, 11], 
for which the fuzzy measure values are computed from 
the fuzzy measure values for the singletons (called/uzzy 
densities) using a fixed t-conorm _L. However, since the 
value of a set of elements is computed only using the 
fuzzy densities of its elements and a fixed _L, the simila- 
rity of the elements in the set is not taken into account, 
and the ability to model interactions between different 
elements of the fuzzy measure universe is limited. 

In the literatuře of classifier aggregation, fuzzy integrál 
is usually ušed with Sugeno A-measure. There is usually 
no explicit reason for the choice of this measure other 
than its simplicity. Sugeno A-measure is a speciál čase of 
a _L-decomposable fuzzy measure, and as such, it cannot 
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model similarities between the individual classifiers, and 
thus the contribution of using fuzzy integrál is unclear. 

In classifier aggregation, we usually try to create a team 
of classifiers that are not similar. This property is called 
diversity [13]. There are many methods for building a 
diverse team of classifiers [3,14-16]; however, the team 
always contains classifiers that are similar. If we use 
the fuzzy integrál with a symmetric or _L-decomposable 
fuzzy measure, we are not able to incorporate the diver¬ 
sity into the measure (and thus to the aggregation pro- 
cess), because the fuzzy measure of a union of two sets 
is a function only of the fuzzy measures of the two sets, 
regardless of the similarity of the elements in the sets. 

To overcome this weakness, we háve introduced an 
Intemction-Sensitive Fuzzy Measure (ISFM) [1], which 
is defined using the fuzzy measure values for the sin- 
gletons (fuzzy densities), and the similarities of the ele¬ 
ments in the universe. If the fuzzy measure space corre- 
sponds to the team of classifiers, the fuzzy measure 
incorporates both the classification confidence (fuzzy 
densities), and the diversity of the team of classifiers 
(mutual similarities of the classifiers). Using ISFM in 
fuzzy integrál as an aggregation operátor in classifier 
aggregation, the aggregation process involves all the im- 
portant properties: the predictions of the classifiers, the 
classification confidences, and the diversity of the team. 

Our preliminary experiments with ISFM ušed with the 
Choquet integrál in Random Forest ensembles háve 
shown that ISFM outperforms Sugeno A-measure [I]. In 
this páper, the results of a more profound investigation 
are reported, and the experiments háve been extended to 
cover the Sugeno integrál and also other classification 
models, námely ensembles of k-Nearest Neighbor clas¬ 
sifiers [17] created by bagging [14] and ensembles of 
Quadratic Discriminant Classifiers [17] created by the 
Multiple feature subset method [18]. 

The páper is structured as follows. In Section 2, we brie- 
fiy summarize the formalism of classification, classifi¬ 
cation confidence, and classifier combining. Section 3 
describes fuzzy measures, fuzzy integrals, and their use 
in classifier aggregation. In Section 4, we introduce the 
ISFM, and in Section 5, we experimentally compare the 
performance of the ISFM to the performance of the Su¬ 
geno A-measure. Section 6 then summarizes the páper. 


2. Classifier Combining 

In this section, we recall the formalism of dynamic clas¬ 
sifier combining, proposed in [5]. Throughout the rest of 
the páper, we use the following notation. Let X C R” 


be a n-dimensional/eafííre space, let Ci,..., Cjv C X, 
A" > 2 be disjoint sets called classes. A pattem is a 
tuple (x,cs), where x E X are features of the pattern, 
and Cg 6 { 1 ,..., A"} is the index of the class the pattern 
belongs to. The goal of classification is to determine the 
class a given pattern belongs to, i.e., to predict cg for 
unclassified patterns. We assume that for 

every x E X, there is a unique classification cg, but 
since it is usually not known, we will sometimes refer to 
a pattern only as x E X. 

Definition 1 The term classifier denotes a map- 
ping <() : Af — > [0,1]^, i.e., for x E 
X, (j){x) = {"/i{x),... The components 

(71 (x),..., 7 iv(x)) are called degrees of classification 
(d.o.c.) to each class. 

The d.o.c. to class Cj expresses the predicted extent to 
which the pattern belongs to class Cj. The prediction 
of Cg for an unknown pattern x is doně by converting 
the continuous d.o.c. of the classifier into a crisp output 
^(cr) maxi=i^,,,_jv 7 i(®) if there are no ties, or 

arbitrarily as 6 argmaxi=i^.,._jv 7 i(x) in the 

čase of ties. 

2.1. Classification Confidence 

In addition to the classifier output (the d.o.c.), which pre- 
dicts to which class a pattern belongs, we will work with 
the confidence of the prediction, i.e., the extent to which 
we can “trust” the output of the classifier. 

Definition 2 Let f be a classifer and Kj, : X ^ [0,1]. 
Then Kj, is called a confidence measure and for x E X, 
Kj,(x) is called classification confidence of f on x. 
A confidence measure is called static if it is a constant 
ofthe classifier, and dynamic otherwise. 

The higher the trust in the classification, the closer 
Kj,{x) is to 1. Static confidence measures evaluate the 
classifier as a whole and they are usually computed 
on a validation set after the classifier is trained. The 
methods include accuracy, precision, sensitivity, resem- 
blance, etc. [17,19]. For example, the Global Accuracy 
confidence measure is defined as: 

X; = Cy) 

,.(GA) _ (ff.cff)ev_ 


where V C X x {l,...,A}is the validation set and / 
denotes the indicator operátor, defined as /(true) = 1, 
I (falše) = 0 (we will use the notation in the rest of the 
páper). 


PhD Conference ’ 11 


134 


ICS Prague 



David Štefka 


ISFM in Dynamic Classifier Aggregation: an Experimental Comparison 


Dynamic confidence measures [5-9, 20] adapt to the 
currently classified pattern and predict the local quality 
of the classification for the particular pattern {x, cg). An 
example of a dynamic confidence measure is the Eucli- 
dean Local Accuracy (ELA): 


JELA) 


(x) 


(y.c5)ev(x) 


( 2 ) 


where V(á;) C V is the set of validation patterns belon- 
ging to some kind of neighborhood of x (for example 
k nearest neighbors under Euclidean metric). 


aggregator. A classifier team with an aggregator will be 
called a classifier systém, which can be also viewed as 
a single classifier. 


Defínition 5 Let (F, JC) be a classifier team, and let A : 
[0, X [0, !]’■ ^ [0,1]^. The triple S = (F, /C, A) 

is called a classifier systém and A is called a team ag¬ 
gregator. We define an induced classifier oí S as a clas¬ 
sifier $.• 

$(x) = A{r{x),lC{x)) = ( 7 i(f),..., 7 Ar(f)). 


2.2. Classifier Systems 

In classifier combining, instead of using just one classi¬ 
fier, a team of classifiers is created (sometimes called an 
ensemble of classifiers), and the team is then aggrega- 
ted into one finál classifier. If we want to utilize classifi¬ 
cation confidence in the aggregation process, each clas¬ 
sifier must háve its own confidence measure defined. 


An example of an aggregation operátor is the mean va¬ 
lné, which defines the aggregated d.o.c. to class j as the 
arithmetic mean of the d.o.c. to class j given by the in- 
dividual classifiers in the team: 

E 

7j(í) = -• (5) 


Definition 3 Let r 6 N, r > 2. Classifier team is 
a tuple (F, /C), where F = {4>i, .. ., fj is a set of clas¬ 
sifiers, and K, = ..., H 4 >J\ is a set of correspon- 

ding confidence measures. 

If a pattern x is submitted for classification, the team 
of classifiers returns information of two kinds - outputs 
of the individual classifiers (a decision profile [21]), and 
classification confidences of the classifiers on x (a con¬ 
fidence vector). 

Definition 4 Let (F, /C) be a classifier team and let x 6 
X. Then the decision profile of (F, /C) on x is a matrix 
F(f) € [0,l]’-xE 


/<> l (5)\ 




■ 7i,n(x)\ 

<p2(x) 

— 

72,1 ( s ) 

72,2(x) . 

■ 12,n{x) 



\7 r , l (: r ) 

1t,2(x) . 



(3) 

and the confidence vector of (F, /C) on x is a vector 
lC{x) 6 [0,1]L 

/k^i(x)\ 


After the pattern x has been classified by all the classi¬ 
fiers in the team, and the confidences háve been com- 
puted, these outputs háve to be aggregated using a team 


We can distinguish three types of classifier systems: 
confidence-free (which do not utilize the classification 
confidence at all), static (which use only static classi¬ 
fication confidence), and dynamic (which use dynamic 
classification confidence, i.e., the aggregation is adap- 
ted to a particular pattern). In this páper, we are mainly 
interested in dynamic classifier systems. 

Many aggregation operators háve been studied in the li¬ 
teratuře: simple arithmetic operations (voting, sum, ma¬ 
ximum, minimum, mean, weighted mean, weighted vo¬ 
ting, product, etc., [21]), probability-based approaches 
(e.g., product rule [21], Dempster-Shafer fusion [21]), 
and fuzzy logic methods (fuzzy integrál [12], decision 
templates [12,21]). Our key interest in this páper lies in 
studying dynamic classifier aggregation using the fuzzy 
integrál, which is described in the following section. 

3. Fuzzy Integrál, Measures and Similarity 

Fuzzy integrál [10, 11,22] is an aggregation operátor, 
based on & fuzzy measure (sometimes called capacity), 
which is a generalization of the additive measure, such 
that the additivity is replaced by a weaker condition 
- monotonicity. Several definitions of a fuzzy integrál 
exists in the literatuře - among them, the Choquet inte¬ 
grál and the Sugeno integrál are ušed most often. In this 
section, we briefly summarize the basic definitions, and 
we show how the fuzzy integrál can be ušed in classifier 
aggregation. For simplicity reasons, we restrict oursel- 
ves to the discrete čase, and to functions in [0,1]. 
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Definition 6 A fuzzy measure ^ on a set U = 
, Ur} is a function on the power set oflA, p : 
V{U) —> [0,1], such that: 

1. ^t(0) = Q,p,{lA) = 1 (boundary conditions) 

2. A <Z B ^ ^ d{B) (monotonicity) 

As the universe U will correspond to the set of classi- 
fiers in the team, we use r to denote the universe size 
(cf. Sec. 3.1). We can now define the Choquet integrál, 
which is a generalization of the classical probabilistic in¬ 
tegrál (for additive measures, it reduces to the Lebesgue 
integrál, i.e., weighted mean in the discrete čase), and 
the Sugeno integrál. As there is no generally accepted 
definition of a fuzzy integrál [10,23], we restrict oursel- 
ves to the Choquet and Sugeno integrals in the rest of 
the páper. 

We will use the following notation. Let f : U = 
{zti,..., Mr} ^ [0,1], f{ui) = fi, i = Then 

< • > indicates that the indices háve been permuted, 
such that 0 = /<o> < /<i> < • • • < f<r> < 1- Mo- 
reover, A<i> = {m<í>, ... ,u<r>} denotes the set of 
of elements of Id corresponding to the (r — i) highest 
values of /. 

Definition 7 Let p be a fuzzy measure on li. Then the 
Choquet integrál of a function / : W —> [0,1], f{ui) = 
fi, z = 1 ,..., r, with respect to p is definedas: 

(C) í fdp = ^(/<i> - f<i-l>)p{A<^>). (6) 

i=l 

Definition 8 Let p be a fuzzy measure on li. Then the 
Sugeno integrál of a function / : W —> [0,1], f{ui) = 
fi, i = 1,... ,r, with respect to p is defined as: 

(S) j /d/i = ]:imxmin(/<i>,/i(A<i>)). (7) 

3.1. Fnzzy Integrál in Classifier Aggregation 

In classifier aggregation, the universe lA corresponds to 
the set of classifiers V in the team, i.e., U = T = 
{(j)i,... ,<j)r}- For X ^ X, the individual columns of the 
decision profile r(x) are integrated using fhe fuzzy in¬ 
tegrál, i.e., the aggregated d.o.c. to class j is defined as 

lj{^) = j'^*,3dp, (8) 

where / is a fuzzy integrál, F* j is the j-th column of F 
(d.o.c. to class Cj), and /i is a fuzzy measure on F. The 


fuzzy measure p represents the importance of a particu- 
lar set of classifiers ušed in the integration (p{Ar:i->) 
represents the importance of the classifiers correspon¬ 
ding to the (r — i) highest d.o.c.). Usually, p somehow 
depends on the confidence vector JC{x). 

3.2. Important Types of Fnzzy Measures 

The behavior of the fuzzy integrál depends heavily on 
the considered fuzzy measure. As the definition of a 
fuzzy measure is vety generál, it gives us a lot of free- 
dom when defining a fuzzy measure. However, to define 
a generál fuzzy measure in fhe discrefe čase, we need fo 
define all its 2’’ values, which is usually very complica- 
fed. To overcome this weakness, approaches which do 
not need all the 2’’ values háve been developed [10,11]. 

3.2.1 Additive Measures: 

Definition 9 Fuzzy measure p on U is called additive, 
if p{A\J B) = p{A) -f p{B) for disjoint A, B C U. 

Additive measures correspond to the classical probabi¬ 
listic measures. The measure is defined only using the 
values for the singletons, p{{ui}), i = 1,..., r (called 
fuzzy densities), and all the remaining values are compu- 
ted using the additivity condition. However, such mea¬ 
sure cannot model interaction between the elements of 
the fuzzy measure space (which in particular implies 
that the diversity of the team of classifiers cannof be ta- 
ken into account in the aggregation). Choquet integrál 
with an additive measure reduces to the weighted mean. 

3.2.2 Symmetric Measures: 

Definition 10 Fuzzy measure p onlA is called symmet¬ 
ric, ifforA,B C U, |A| = \B\ => At(2l) = p{B), 
i.e., its value depends only on the cardinality ofthe ar¬ 
gument, p{A) = gi(|A|). 

We can choose any nondecreasing function g, such that 
g(0) = 0 and g(r) = 1 to model the importance of 
a set of r elements. ff a symmetric measure is ušed in 
Choquet integrál, the integrál reduces to the Ordered 
Weighted Average operátor [10]. However, symmetric 
measures assume that all the classifiers háve fhe same 
importance, and thus not only symmetric fuzzy mea¬ 
sures do not také similarities of the classifiers into ac¬ 
count, but moreover, the resulting aggregation scheme is 
confidence-free, i.e., the classificatoin confidence does 
not influence the aggregation. As we deal with dynamic 
classifier sysfems only, we do not také symmetric mea¬ 
sures into account in the rest of the páper. 
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3.2.3 _L-decomposable Measures: 

Deflnition 11 Let /i be afuzzy measure on lA and let _L 
be a t-conorm. Then ^ is called _L-decomposable, iffor 
disjoint A, B C W, 

ti{Au B) = n{A) ± n{B). (9) 

_L-decomposable measures need only the r fuzzy densi- 
ties and all the other values are computed using the 
formula (9). Particular cases of _L-decomposable fuzzy 
measures are additive measures (_L being the bounded 
sum), and the Sugeno A-measure [10,11], defined as 

li{Au B) = fi{A) + f^{B) + \ii{A)ti{B), (10) 

for disjoint A,BeU, and some fixed A > — 1. The va¬ 
lné of A is computed as the unique non-zero root greater 
than — 1 of the equation 

A-|-1 = {1 + , (11) 

i=l,...,r 

if the densities do not sum to 1. If they do sum to 1, 
A = 0 and the fuzzy measure is additive. 

The Sugeno A-measure is ušed most often in classi¬ 
fier aggregation using fuzzy integrál (with the fuzzy 
densities corresponding to the classification confiden- 
ces, jj,{{ui}) = K^^{x)). However, its use is usually not 
supported by any arguments and it is basically selected 
because of its simplicity. 

A strong weakness of any _L-decomposable measure 
(and Sugeno A-measure in particular) is that it cannot 
model the interaction (similarities) between the classi- 
fiers, because the fuzzy measure value of a set of two 
(or more) classifiers is fully determined by the formula 
(9) with a fixed _L. Therefore, the diversity of the team 
of classifiers cannot be taken into account in the aggre¬ 
gation (as in the čase of additive measures). 

To overcome the weaknesses of the methods presented 
above, we háve defined an Interaction-Sensitive Fuzzy 
Measure (ISFM) [1], which is defined not only using the 
fuzzy densities, but also using mutual similarities of the 
classifiers in the team. The method is described in the 
following section, but prior to that, we formally define 
the concept of a similarity [24]. 

3.3. Similarity of Classifiers 

Definition 12 Let A be a t-norm and let S : lA xlA ^ 
[0,1] be a fuzzy relation. S is called a similarity with 
respect to A if the following holds Vo, &, c 6 lA: 


• 5(0, a) = 1 (reflexivity) 

• S{a,b) = S{b,a) (symmetry) 

• S{a, b) A S{b, c) < S(a, c) (transitivity w.r.t. A) 

In the context of classifier combining, we will work with 
similarity of classifiers in particular, which, for classi¬ 
fiers (l>k,4>i, will be measured empirically as the propor- 
tion of equal crisp predictions on the validation set V, 

-• (12) 

The relation (12) is a similarity with respect to 
Lukasiewicz t-norm Al, but it is not a similarity with 
respect to standard or product t-norms As, Ap. 

4 . Interaction-Sensitive Fnzzy Measure and its Use 
in Fuzzy Integrál 

Methods for constructing a team of classifiers usually 
try to create a team which is both both accurate and di¬ 
verse [2,3,13]. Diversity of the classifiers in the team is 
a key property in classifier combining, since if the clas¬ 
sifiers are very similar, the classifier combining cannot 
improve the classification quality. 

Fuzzy measures represent a convenient tool to work with 
the diversity of the team. As /í(A<í>) are computed for 
i = r,... ,1, i.e., in í-th step, classifier is added to 
the set of classifiers ..., (/><r>}, 

we can influence the increase of the fuzzy measure - 
if 9 )<í> is similar to the classifiers in Al<i+i>, the in¬ 
crease in the fuzzy measure should be smáli (since the 
importance of the set should be similar to the im- 
portance of the set A<i+i>), and if ?)<{> is not similar 
to the classifiers in j4<í+i>, the increase of the fuzzy 
measure should be large. 

_L-decomposable fuzzy measures (and in particular ad¬ 
ditive measures and Sugeno A-measure) cannot model 
such interactions between the classifiers, because they 
are defined only using the fuzzy densities and a fixed 
_L. Therefore, we propose an Interaction-Sensitive Fuzzy 
Measure (ISFM), which incorporates the similarities of 
the classifiers in the team, defined using the following 
recursive definition. 

Definition 13 Let lA = {ui,... ,Ur} be a universe, let 
S be a similarity w.r.t. a t-norm A, Sij = S{ui,Uj), 
let Ki 6 [0,1], z = denote the importance 

(weight) of Ui, and let ..., zi<r>}. 
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A<r+i> = 0. where < • > denotes index ordering ac- 
cording to some / : W —> [0,1], such that 0 < /<i> < 

■<f<r>< 1. 

Let Jí : V{hl) —> R"*", such that 


the z-th classifier is totally similar to some other clas¬ 
sifier in j4<í+i>, then does not increase, and if it is 
totally unsimilar to all classiíiers in j4<í+i>, the incre¬ 
ase in the fuzzy measure is maximal. 


II 

o 

(13) 

/i(A<r>) — M({tí<r>}) — kl<r> 

(14) 

F{.A<i>) = ■ ■ , M<r>}) = 

(15) 

= /ž(A<i+i>) -b (1 - max 
k=i+l 

S<i>,<k>)K<i> 


(16) 

/orz = r - 1,..., 1, 

(17) 

and^iX QU, X ^ ^<i>, i = 1,..., 

r. 


(18) 


where q = min{z = r -|- 1,. .., C X}. 

The mapping : V{U) —> [0,1], defined as 


/iW(X) = 


J1{X) ^J1{X) 


t{A<i>) p.{U) ’ 


(19) 


is called an Interaction-Sensitive Fuzzy Measure 
(ISFM) on U with respect to f. 


For the fuzzy integration itself, only the values for 
A<i>, i = (15-17) are needed, the remaining 

values (18) represent an extension to the whole power 
set and are needed only for to be properly defined. 
(19) represents a normalization of Jí to [0,1]. 

The definition is generál and can be ušed also in other 
applications than classifier combining. In classifier com- 
bining, ZT = r is the set of classiíiers, kí = k^{x) are 
the classification confidences, / = F* j is the j-th co- 
lumn of the decision profile, and S denotes the similarity 
of classiíiers (12). The following proposition shows that 
/i(^( is well-defined. 


Proposition 2 Let be an ISFMonU w.r.t. f :U ^ 
[0,1], and let i 6 {1,..., r — 1}. Then the following 
holds 


1. 3k 6 {i + s^iy^^k> = 1 ^ 

pf'^\A<i>) =/i(-^((2l<í+i>) 

2. Vfc € {z -|- 1,..., r} = 0 => 

{A<i>) = /i(-^((2l<i+l>) + / p{hl) 


Proof: Trivially from (16) and (19). ■ 

The following proposition describes an extreme čase, 
in which all the classiíiers are totally similar (the mea¬ 
sure in the integrál behaves like a constant measure and 
Choquet and Sugeno intergrals reduce to the maximum 
value). 


Proposition 3 Let be an ISFM onU w.r.t. f : U ^ 
[0,1], f{ui) = fi, andlet^/ij 6 {l,...,r}, z 
j, Sij = 1. Then \/X C U 

1 . Vfc6{l,...,r} ^(^)(A<fc>) = l 

2. 6 {1,..., r} 2l<fc> C X ^ F*'^HX) = 1 

3. Vfc6{l,...,r} ^X^/zW(X) = 0 

4. (C) f = {S) f 

Proof: (1) follows directly from (15-17) and (19); (2), 
(3) from (18) and (4) is an application of the measure to 
the definition of Choquet and Sugeno integrals. ■ 


Proposition 1 is a fuzzy measure on Li. 

Proof: The boundary conditions follow directly from 
the definition of /z(^(. Let X C Y C U. Then 
qx = min{z = r -h 1,..., l|X<i> C X} > gy = 
min{z = r-|-l,..., l|2l<i> C X}, and thus, pJ'^\X) = 
F^^\A<qx>) < which pro- 

ves the monotonicity. ■ 

In (16), s<i>,<k> incorporates the diver- 

sity of the team of classiíiers into the fuzzy measure. 
The following proposition shows that if for some z. 


Another extreme čase is that all the classiíiers are to¬ 
tally unsimilar (the measure in the integrál behaves like 
an additive measure and the Choquet integrál reduces to 
the weighted mean). 

Proposition 4 Let /i(^( be an ISFMonU w.r.t. f : U ^ 
[0,1], f{ui) = /j, andlet\/i,j 6 {l,...,r}, z ^ 
h ^i,j — (*• Then the following holds: 

1. Vfc € {l,...,r} /z(^)(A<,>) = = 

Xl = k ^<í> 

sr=i 'í<i> 
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2 . = 

Proof: (1) follows directly from (15-17) and (19). (2) 
is an application of the measure to the definition of the 
Choquet integrál. ■ 

5. Experiments 

To experimentally compare the ISFM-based approach 
with the Sugeno A-measure approach, we designed three 
different classifier systems: 

• Random Forest ensemble [16]. In our experi¬ 
ments, we ušed r = 20 trees. 

• Ensemble of k-Nearest neighbor classifiers [17] 
created by bagging [14]. In our experiments, we 
ušed r = 20 classifiers in the team with k = 5. 

• Ensemble of Quadratic discriminant classifiers 
[17] created by the multiple feature subset me- 
thod [18]. Each classifier was trained only on a 
subset of features. For datasets with n < 5 di- 
mensions, all possible subsets (feature combinati- 
ons) in the MFS were ušed. For higher dimensio- 
nal datasets, 32 subsets of features were selected 
by bagging. 

To compute the classification confidence, we ušed the 
ELA method (2). The number of neighbors was set 
based on the size of the dataset to fc = 5 (< 500 pat- 
terns), fc = 10 (501 — 1000 patterns), or fc = 20 (> 1000 
patterns). The values of the parameters were set based on 
preliminary testing, no optimization or fine-tuning was 
doně. As aggregation operators, we ušed the following 

• Weighted mean - representing the baseline (spe¬ 
ciál čase of the Choquet integrál with additive me¬ 
asure) 

• Choquet integrál with the A-measure 

• Choquet integrál with the ISFM 

• Sugeno integrál with the A-measure 

• Sugeno integrál with the ISFM 

• Single best (for reference) - mean error rate of the 
classifier with lowest error rate selected in each 
crossvalidation run, representing the “worst-case” 
scenario 

• Oracle (for reference) - the theoretical “best- 
case” scenario, which, for a given pattern, gives 
correct prediction if and only if any of the classi¬ 
fiers in the team gives correct prediction 


The methods were implemented in the Java program- 
ming language and the experiment was performed on 
7 artificial and 19 real-world datasets with varying size, 
dimensionality, and class count (dne to numerical insta- 
bilities of the QDC model, we had to leave out three 
real-world datasets for the QDC ensemble). The proper- 
ties of the datasets are shown in Table 1. We ušed 10-fold 
cross-validation to measure the performance of the me¬ 
thods (8 folds for training set, 9th fold for validation 
set, lOth fold for testing set, with cyclic shift). The va¬ 
lidation set was ušed to compute the classification con¬ 
fidence and the similarity of the classifiers in the team, 
and the testing set was ušed to compare the results of the 
methods. The mean value and standard deviation of the 
error rate were measured. We also measured statistical 
significance of the results (at 5% confidence level by the 
analysis of variance using Tukey-Kramer method). 


Table 1: Properties of the datasets ušed in the experiments. 


Dataset 

ref. 

size 

classes 

dimensions 

Artificial 

clouds 

[25] 

5000 

2 

2 

concentric 

[25] 

2500 

2 

2 

gauss 3D 

[25] 

5000 

2 

3 

gauss 8D 

[25] 

5000 

2 

8 

ringnorm 

[26] 

3000 

2 

20 

twonorm 

[26] 

3000 

2 

20 

waveform 

[26] 

5000 

3 

21 

Real-world 

balance 

[26] 

625 

3 

4 

breast 

[26] 

699 

2 

9 

glass 

[26] 

214 

7 

9 

iris 

[26] 

150 

3 

4 

letter-recg. 

[26] 

20000 

26 

16 

pendigits 

[26] 

10992 

10 

16 

phoneme 

[25] 

5427 

2 

5 

pima 

[26] 

768 

2 

8 

poker 

[26] 

4828 

3 

10 

satimage 

[25] 

6435 

6 

4 

segmentation 

[26] 

2310 

7 

16 

sonar 

[26] 

208 

2 

60 

textuře 

[25] 

5500 

11 

10 

transfusion 

[26] 

748 

2 

4 

vehicle 

[26] 

946 

4 

18 

vowel 

[26] 

990 

11 

10 

wine 

[26] 

178 

3 

13 

wineq-red 

[26] 

1600 

3 

11 

wineq-white 

[26] 

4898 

3 

11 

yeast 

[26] 

1484 

4 

8 
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Table 2: Random Forest: The i, j-th element of the table shows the number of datasets in which method i obtained lower mean 
error rate than method j. The number in parentheses, if present, shows the number of datasets for which the improvement 
was statistically significant (excluding Oracle). The last column shows the number of datasets for which a given method 
was better than all the other methods (excluding Oracle). 


^ superior to ^ 

SB 

WMean 

CI-A 

CI-ISFM 

SI-A 

SI-ISFM 

Oracle 

all 

SB 

- 

0 

1(1) 

0 

1(1) 

0 

0 

0 

WMean 

26(16) 

- 

12(3) 

3 

12(5) 

5 

0 

1 

CI-A 

25 (16) 

14 

- 

5 

14 

8 

0 

1 

CI-ISFM 

26(18) 

23 

21 (5) 

- 

19(5) 

16 

0 

11 

SI-A 

25 (17) 

14 

12 

6 

- 

8 

0 

4 

SI-ISFM 

26(18) 

21 

18(3) 

10 

18(4) 

- 

0 

9 

Oracle 

26 

26 

26 

26 

26 

26 

- 

26 


Table 3: k-NN ensemble: The i, j-th element of the table shows the number of datasets in which method i obtained lower mean 
error rate than method j. The number in parentheses, if present, shows the number of datasets for which the improvement 
was statistically significant (excluding Oracle). The last column shows the number of datasets for which a given method 
was better than all the other methods (excluding Oracle). 


^ superior to —>■ 

SB 

WMean 

CI-A 

CI-ISFM 

SI-A 

SI-ISFM 

Oracle 

all 

SB 

- 

7 

3 

2 

2 

2 

0 

0 

WMean 

19(1) 

- 

3 

4 

3 

3 

0 

0 

CI-A 

23 (3) 

23 

- 

10 

17 

11 

0 

9 

CI-ISFM 

24(6) 

22 (3) 

16 

- 

19(1) 

14 

0 

10 

SI-A 

25 (2) 

23(1) 

11 

7 

- 

8 

0 

2 

SI-ISFM 

24(8) 

23 (3) 

15 

12 

18(1) 

- 

0 

7 

Oracle 

26 

26 

26 

26 

26 

26 

- 

26 


Table 4: QDC ensemble: The i,y-th element of the table shows the number of datasets in which method i obtained lower mean 
error rate than method j. The number in parentheses, if present, shows the number of datasets for which the improvement 
was statistically significant (excluding Oracle). The last column shows the number of datasets for which a given method 
was better than all the other methods (excluding Oracle). 


l superior to ^ 

SB 

WMean 

CI-A 

CI-ISFM 

SI-A 

SI-ISFM 

Oracle 

all 

SB 

- 

8 

6 

4 

7 

3 

0 

1 

WMean 

15 (8) 

- 

12(2) 

2 

13(2) 

4 

0 

0 

CI-A 

17 (6) 

11 

- 

5 

14(1) 

7 

0 

3 

CI-ISFM 

19 (8) 

21(4) 

19(5) 

- 

19(5) 

11 

0 

10 

SI-A 

16(8) 

10 

9 

4 

- 

7 

0 

1 

SI-ISFM 

20(9) 

19(4) 

16(5) 

12 

16(5) 

- 

0 

8 

Oracle 

23 

23 

23 

23 

23 

23 

- 

23 
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To compare the methods in generál, we measured the 
number of datasets in which a given method outperfor- 
med other methods, the results are shown in Tables 2-4. 

As our main goal in this experiment was to com¬ 
pare ISFM with Sugeno A-measure, we can say the 
following. For the Random Forests with Choquet in¬ 
tegrál, ISFM outperformed A-measure on 21 datasets 
(5 times significant), with Sugeno integrál on 18 data¬ 
sets (4 times significant), out of 26 datasets total. For the 
k-NN ensemble with Choquet integrál, ISFM outperfor¬ 
med A measure on 16 datasets (none significant), with 
Sugeno integrál on 18 datasets (once significant), out of 
26 datasets total. For the QDC ensemble with Choquet 
integrál, ISFM outperformed A measure on 19 datasets 
(5 times significant), with Sugeno integrál on 16 datasets 
(6 times significant), out of 23 datasets total. 

Generally speaking, fuzzy integrál with ISFM usually 
outperformed A-measure in most cases (sometimes sta- 
tistically significantly, but no significant outperforming 
of A-measure over ISFM occurred). The Choquet inte¬ 
grál obtained slightly better results than the Sugeno in¬ 
tegrál, and the Choquet integrál with ISFM was the most 
succesfull aggregation scheme in these experiments. 
Another interesting result is that while both Choquet and 
Sugeno integrals with ISFM outperformed the Weighted 
Mean, this is not true for the čase of Sugeno A-measure - 
in most cases, both Choquet and Sugeno integrals with 
A-measure obtained comparable or significantly worse 
results than the Weighted mean. 


6. Conclusion 

In this páper, we háve summarized how the fuzzy inte¬ 
grál can be ušed as an aggregation operátor in dynamic 
classifier systems. We háve discussed that symmetric, 
and _L-decomposable fuzzy measures are not appropri- 
ate for using in classifier combining with fuzzy integrál 
and we háve introduced an interaction-sensitive fuzzy 
measure (ISFM), which tries to overcome the weak- 
nesses of these methods. IFSM, designed specifically 
for the use in classifier aggregation, provides a conve- 
nient tool for representing the diversity of the team of 
classifiers, and, when ušed in the fuzzy integrál, the 
aggregation can incorporate the classifier predictions, 
the classification confidences, and also the diversity of 
the team. Our experiments with three different dynamic 
classifier systems with the Choquet and Sugeno integrals 
on 26 datasets show that the ISFM outperforms the Su¬ 
geno A-measure, which is ušed most often in the litera¬ 
tuře in connection with the fuzzy integrál. 
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Abstract 

Trust management Systems háve been proposed to address joyless security in open and distributed environments 
(Web, Semantic Web, Peer-to-peer networks, etc.). A user usuatly spent a lot of time by building her/his reputation 
and by creating a network of trusted/distrusted users. Without possibility of seamless transfer from one trust 
management systém to another, a user is forced to build a new reputation/network of trusted users again. This 
problém will become even more severe, as many current Systems using trust as a key factor influencing ability to 
communicate within a group of users will be outdated and some of them even down. 

The páper presents a specification of the seamless transfer problém and it also introduces a solution - the Heritage 
Trust Model based on dynamic graphs and ontologies. 

From our point of view, the main property missing in known trust management systems is the ability to store evo- 
lution of relationships/reputations between users. Thus, we propose a model that is able to store the whole evolution 
of reputation/social relationships between users. 

The following itemization gives the basics for the proposal: 

• Each user in the systém has its own viewpoint of her/his vicinity. It means that relationships are not symmetric 
and a user may not be aware of what the others think of her/his. This is important as the same peculiarity exists 
in the reál human societies. 

• A particular relationship/tmst between users has different meaning based on the context. 

• Each user is responsible for its own network. Each user may háve its own preferences, aims and wills. 

The Heritage Trust Model aims to overcame main insufficiencies of current trust management systems. As the 
most severe we háve identified impossibility to transfer reputation/trust relationships from one systém to another, as 
well as the impossibility to compare trust management systems using different notion of trust. 

To overcome this issues we háve proposed a trust model, that is based on idea of maintaining history (evolution) of 
relationships between users with addition of ontology for description of relationships between users. Using ontology 
roles as a weight assigned to a relationship allows transfer as well as comparison of trust management systems. Storing 
evolution of relationships enables also fault tolerance - users are able to learn a lesson from their own mistakes. 

As the future work, we are going to implement the Heritage Trust Model for storing evolution of relationships 
and verify expected space complexity. The next step would be an implementation of an ontology designed/extended 
to be appropriate to model whole complexity of human relationships and experimental comparison of selected trust 
management systems with use of the Heritage Trust Model. 

My contribution was to propose an ontology for the Heritage Trust Model with a possibility of collaboration with 
existing ones. 


This Work was published and presented: SPÁNEK, ROMAN - TYL, PAVEL. The Heritage Trust Model. In: Proceedings of International 
Conference on Digital Information and Communication Technology and its Applications (DICTAP 2011), (Eds. H. Cherifi, Zain, J. M. Zain, E. 
El-Qawasmeh), Communications in Computer and Information Science (CCIS), Part II, Vol. 167, pp. 307-321, Springer 2011. ISBN 978-3-642- 
22026-5. Presented at: International Conference on Digital Information and Communication Technology and its Applications (DICTAP 2011), 
21.-23. 6. 2011, Dijon, France. 
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Abstract 


This contribution provides a generalization of many particular results about speciál types of filters, e. g. (positive) 
implicative, fantastic and boolean filters on algebras of Rasiowa implicative logics. Our approach uses the framework 
of Abstract Algebraic Logic (AAL) and is based on the close connection between the filter-defining conditions and 
alternativě axiomatizations of the logics involved. 

The key notion of this work is the notion of íí-L-filter, which arises from the standard definition of L-filter in 
AAL, and allows us to deal with L-filters satisfying given speciál conditions in a uniform way. 

We háve identified four main kinds of theorems proved in several papers published in the last five years and we 
háve formulated generál theorems which - together with straightforward syntactical proofs - yield the majority of 
published results as their direct consequences. 
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Abstrakt 

Článek pojednává o výsledcích strukturální a 
lexikální analýzy lékařských zpráv. V této části 
zpracování lékařských zpráv jsem prakticky 
ovéřil použitelnost dostupných klasifikačních 
systémů i obecných nástrojů a databází. 

1. Vědecká otázka a očekávaný přínos 

Hlavním cílem práce je zjištění specifických vlastností 
českých lékařských zpráv z hlediska možnosti extraho¬ 
vat z nich konkrétní údaje. Realizace cíle předpokládá 
splnění dílčích cílů: 

1. Zodpovědět otázku „Které vlastnosti českých 
lékařských zpráv působí největší problémy v 
jednotlivých nestatistických fázích zpracování 
přirozeného jazyka?". Jednotlivými fázemi 
přitom jsou strukturální analýza, lexikální analýza 
a slovní rozbor. 

2. Navrhnout základní postup pro analýzu česky 
psaných lékařských zpráv. 

3. Pomocí vlastní implementace s využitím ex¬ 
terních nástrojů ověřit navržený postup pro 
analýzu česky psaných lékařských zpráv a 
základní postup i výsledky publikovat. 

Ověřovanou hypotézu jsem formuloval takto: „Z od¬ 
borných lékařských zpráv psaných v českém jazyce lze 
pod supervizí odborníka a za použití technologií pro 
zpracování přirozeného jazyka získávat specifikované 
odborné informace, například seznam známých aler¬ 
gických reakcí či výsledky biochemických vyšetření." 

Přínosem výzkumu by mělo být přiblížení či přímo im¬ 
plementace nástrojů pro asistovanou extrakci informací 


z lékařských textů psaných v českém jazyce. Extraho¬ 
vané informace lze následně využít pro potřeby elek¬ 
tronické zdravotnické dokumentace nebo pro využití 
společně s dalšími technologiemi (např. jako vstupní 
data do automatů provádějících formalizovaná lékařská 
doporučení). 

Tématu extrakce informací z lékařských zpráv se 
věnoval Semecký, který v [1] uvedl důvody pro které 
se zdá, že lingvistická analýza lékařských zpráv nemůže 
být úspěšná. Semecký v [1] používal především re¬ 
gulárních výrazů pro extrakci číselných hodnot. Na 
práci [1] navázal Smatana v práci [2], rozšířil přístup Se- 
meckého o lingvistickou analýzu a došel k mírně lepším 
výsledkům. 

Od mé práce očekávám další rozšíření, především vy¬ 
tvoření pracovního číselníku pro kardiologii navázaného 
na koncepty UMLS [3] a jeho aplikování na dostupné 
lékařské zprávy. 

2. České lékařské zprávy 

České lékařské zprávy jsou vesměs textové dokumenty. 
Jejich obsah i forma jsou upraveny zákonem č. 20/1966 
Sb ve znění pozdějších předpisů „o péči o zdraví lidu" 
[4] (především v § 67b) a vyhláškou č. 385/2006 Sb. 
ve znění pozdějších předpisů „o zdravotnické dokumen¬ 
taci" [5] (vyhláška je závazná, neboť úpravu umožňuje 
§ 67b odstavec 19 zákona). 

Styl formátování lékařských zpráv se liší i přesto, že 
vyhláška o zdravotnické dokumentaci taxativně vyjme¬ 
novává obsah zdravotnické dokumentace pro její jednot¬ 
livé druhy. Lékaři záznamy ve zdravotnické dokumen¬ 
taci tvoří obvykle podle šablony, resp. upravením po¬ 
slední zprávy stejného druhu u stejného pacienta. Ta¬ 
kový postup totiž lékařům šetří čas; jednotlivé druhy 
zpráv obvykle musejí obsahovat velké množství s časem 
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se jen málo měnících informací jako je identifikace zdra¬ 
votnického zařízení, administrativní údaje o pacientovi 
(datum narození, číslo pojištěnce, adresa pobytu), část 
diagnostické rozvahy (především dlouhodobé diagnózy 
a známé alergie) a dlouhodobou medikaci (např. léky 
pro snižování krevního tlaku). 

Pro výzkum mám k dispozici sady zpráv ze dvou zdrojů. 
Při praktickém ověřování postupů proto data z jed¬ 
noho zdroje využívám pro nastavení ověřovacího po¬ 
kusu (např. pro vytvoření slovníku) a data z druhého 
zdroje využívám pro zjištění úspěšnosti metody. 

3. Strukturální analýza 

Strukturální analýza představuje první fázi zpracování 
textu. Úkolem strukturální analýzy je tokenizace, 
rozdělení do vět a případně takě do vyšších struktur 
(např. odstavců). 

Obvyklým postupem pro strukturální analýzu je 
rozdělení vstupního textu podle speciálních znaků, tedy 
symbolů ukončujících slova (mezera, čárka, středník), 
věty (teěka, otazník, vykřiěník). Ceskě lěkařské zprávy 
jsou však značně netypickými texty. Obsahují ohromně 
velké množství zkrácených slov a zkratek. 

Při použití běžného přístupu ke strukturální analýze 
jsem velmi brzy zjistil, že v českých lékařských 
zprávách je význam speciálních znaků odvoditelný až 
z jejich okolí. Čeština totiž patří k jazykům s volným 
pořadím slov. Způsob zápisu lékařských zpráv není 
striktně standardizován [6]. 

Ukázka textu v ěásti objektivní nález: ,/íkce pravidelná, 
klidná, 2 ohr. ozvy. Břicho klidné, játra, sleziona nezv., 
tapot. nebol., jizva po CHE keloidní. Akné po trupu. DK 
bez otoků a varixů. “ 

Výše uvedená věta ukazuje několik typických vlastností 
českých lékařských zpráv: 

• Většina vět neobsahuje sloveso, protože je zřejmé 
z kontextu. V první větě navíc chybí určení 
předmětu - jde o akci srdce. 

• Druhá věta obsahuje překlep („sleziona“ namísto 
„slezina"), lékařské zprávy jsou protkány 
překlepy. 

• V uvedených čtyřech větách jsou čtyři zkrácená 
slova a dvě zkratky. 

Problematika zkracování slov není typická jen pro českě 
lékařské zprávy. [7] uvádí, že lékaři jiných odborností 


jsou schopni správně interpretovat jen asi polovinu 
užívaných zkratek a zkrácených slov. Podobné potíže 
uvádí také [8] a z oboru práva též [9]. 

Některé části lze správně identifikovat až z kontextu. Z 
toho důvodu jsem se rozhodl ve fázi strukturální analýzy 
standardizovat konce řádků (CRh-LF na CR) a transfor¬ 
movat vstupní text do řetězce objektů (nazývám je kon¬ 
tejnery), přičemž po skončení průběhu v této fázi jsou 
objekty následujících druhů: 

• řetězec alfanumerických znaků (po sobě 
jdoucích), 

• jiný znak (u toho je možné uvést kolikrát za sebou 
se stejný znak opakuje). 

Na získaný řetězec objektů aplikuji metody, které z 
podřetězce odvozují další druhy objektů. Metody apli¬ 
kuji i na podřetězce tvořené z takto získaných nových 
objektů. Tímto způsobem identifikuji: 

• numerické řetězce (celé číslo bez znaménka) - 
číslo, 

• separovaná čísla (vždy kombinace: číslo [se- 
parátor číslo]+), 

• datum ve formátu d.m.r (s mezerami či bez mezer 
za tečkami), 

• rodné číslo (kontrola existence data, kontrola 
součtem u 10-ciferných) - s lomítkem i bez 
lomítka. 

4. Lexikální analýza 

Úkolem lexikální analýzy je identifikovat jednotlivé 
základní části textu, tedy slova, hodnoty a podobně. 
Lékařské zprávy jsou zvláštním druhem volného textu. 
Hledal jsem proto slovník, který bych mohl využít pro 
identifikaci slova. 

Obecné české korpusy považuji pro tento účel za ne¬ 
vhodné, protože jsou vytvářeny z jiného druhu projevů, 
obvykle z prózy či novinových článků. Při hledání jsem 
zjistil, že databáze pro volně šiřitelný slovník pro au¬ 
tomatickou kontrolu pravopisu iSpell, je GNU licencí 
(zajišťující použitelnost pro vědecké účely), a že jeho 
autor myslel na možné další využití slovníku. Slova to¬ 
hoto slovníku jsou uspořádána do několika různých sou¬ 
borů, je tak snadno možně identifikovat velké množství 
jmen a názvů. Pravidla, jejichž využitím iSpell generuje 
další tvary a odvozená slova, jsou zapsána tak, že od¬ 
povídají tvorbě jednotlivých slovních druhů. 
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V lékařských zprávách je velké množství odborných 
termínů. U nezkrácených českých slov se mi po¬ 
mocí rozšířeného slovníku iSpellu podařilo identifikovat 
slovní druh bez závažnějších problémů i když v mnoha 
případech nikoliv jednoznačně. Vlastní jměna totiž ěasto 
odpovídají obecnému podstatnému nebo přídavnému 
jménu (např. Dlouhý či Noha). Pokud jsou taková slova 
na začátku věty, v části lexikální analýzy není možné 
řádně klasifikovat slovo. 

Pozornost jsem dále upřel na snahu identifikovat od¬ 
borné termíny, nebof jedním z cílů je zjištění možnosti 
získat ze zprávy anamnestické informace, především in¬ 
formace o diagnózách, alergiích a výsledcích bioche¬ 
mických vyšetření. Našel jsem celkem tři klasifikační 
systémy, které by bylo možné využít pro identifikaci jed¬ 
notlivých odborných termínů. 

Prvním testovaným systémem byla anglická verze kla¬ 
sifikačního systému SNOMED CT [10]. Pomocí to¬ 
hoto klasifikačního systému se podařilo identifikovat 
termíny, které nebyly zkrácené, a které mají stejné znění 
v českém i v anglickém jazyce. Vzhledem k odbor¬ 
nosti vstupních lékařských zpráv (kardiologie), tak šlo 
o tyto konkrétní termíny: „diabetes mellitus" (SNO¬ 
MED CT 73211009) a jednotku mmHg (SNOMED 
CT 259018001). Česká verze SNOMED CT neexistuje, 
mimo jiné proto, že ani existovat nemůže. Česká re¬ 
publika totiž není členem International Health Termino¬ 
logy Standards Development Organisation (IHTSDO), 
vlastníka klasifikačního systému SNOMED CT. SNO¬ 
MED CT není použitelný pro identifikaci lékařských 
termínů ve volném textu. 

Druhým testovaným klasifikačním systémem byla 
Mezinárodní klasifikace nemocí verze 10 (ICDIO, 
MKNIO) v české verzi [11]. Tento číselník byl velkým 
zklamáním, jeho překlad byl totiž vytvořen jen pro ruční 
vyhledávání podle kódu diagnózy. Mnoho přeložených 
textů je totiž složeno ze zkrácených slov, přičemž v 
některých případech je jedno slovo zkracováno různými 
způsoby. V tomto záznamu je dvakrát zkráceno slovo 
„diabetes", pokaždě jinak: „Diabet.polyneuropat. při 
diab.“. V některých případech je text kvůli zkracování 
slov i obtížně čitelný: ,J.deg.on.oč.víčka a periok.kr.“. 
Vzhledem k velmi častému zkracování slov v MNKIO 
tento klasifikační systém není využitelný pro identifikaci 
odborných termínů ve volném textu. 1 kdyby však slova 
zkrácena nebyla, vzhledem ke skutečnosti, že MKNIO 
obsahuje jen výčet diagnóz, nebyl by tento číselník 
využitelný pro využití většiny klinických termínů. 

Třetím testovaným klasifikačním systěmem byl biblio¬ 
grafický klasifikační systém Medical Subject Headings 
(MeSH) v české verzi [12]. Pomocí MeSH se podařilo 


identifikovat průměrně cca 10 termínů na lékařskou 
zprávu [13]. MeSH není klinicky orientován a tomu 
odpovídaly také výsledky. Identifikované termíny od¬ 
povídaly především označení částí těla, v malě míře 
měřeným parametrům, v jednom případě diagnóze. 
Skutečně odborné termíny tak zůstaly neidentifikované. 

5. Závěr a výhled 

Jak uvádím výše, zjistil jsem, že žádný z dostupných 
klasifikačních systémů není využitelný pro identifi¬ 
kaci odborných termínů. V současné době z části 
zpráv vytvářím databázi v českých zpráv užívaných 
odborných termínů mapovaných na koncepty UMLS 

[3]. Jakmile budu mít zpracovanou základní databázi, 
otestuji její využitelnost jejím využitím na identifikaci 
termínů ze všech dostupných zpráv. 
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Abstract 

Searching for the clinical valid information 
in the large bibliographic database can be time 
consuming and hard work. We designed an 
easy-to-use web Service Cardio Online Reader 
(COR) specialized on the topič of Cardiology. 
As a source we use the PubMed database adding 
simple filter and sociál functions for sharing the 
content. 


1. Introduction 

The cumulative total of journal articles exceeded 50 mil- 
lion in 2009 [1]. The most important free accessible re- 
source of hiomedical science articles is the PubMed da¬ 
tabase, which is one of key Services provided by the 
National Center for Biotechnology Information (NCBI). 
The PubMed database contained 21067999 article citati- 
ons on the I August 2011. 

It is extremely complex problém to orientate oneself 
and to find desired information in this huge amount of 
papers. Especially it is important when searched infor¬ 
mation is clearly defined by a clinical domain, deman- 
ded time of publishing keywords or authors. 

The web interface if the PubMed database [2] acces¬ 
sible on http://www.ncbi.nlm.nih.gov/pubmed/ offers 
one html form field for searching for key terms in the da¬ 
tabase. It put the accent on search query syntax, for the 
definition shouid be defined precisely. When the search 
query consists of one or two key terms, the search engine 
returns often tens of thousand results. The result list is 
sorted by time in a descendent order, so the most recent 
articles come at first, but this order says nothing about 
qualitative parameters of articles. 


The NCBI web pages aiso offers an advan- 
ced search tool for the PubMed database 

http://www.ncbi.nlm.nih.gov/pubmed/advanced. 
Users can define the search query in the PubMed Advan¬ 
ced Search in 39 parameters, which Stores the PubMed 
database. Advanced Search saves a history of searched 
queries for each user. These queries can be repeatedly 
retrieved. 

The web interface of Advanced Search is more compii- 
cated to use then one form field in the hasič search. The 
definition of the query is more time consuming and it ne- 
eds an experience with search query formulation for ob- 
taining high-quality results. The best information sour- 
ces provide relevant, valid materiál that can be accessed 
quickly and with minimal effort [3]. 

2. The Cardio Online Reader Web Application 

Clinicians shouid obtain the information they want easy 
and quickly. Our purpose was to simplify the pro- 
cess of obtaining searched articles in the stressing and 
time lacking situation of clinical practice. We wanted 
to allow clinical workers without an experience with 
advanced database search tools to utilize the possibili- 
ties of large bibliographical databases. In the first phase 
we decided to limit the area of clinical domains to 
the Cardiology and developed the Cardio Online Rea¬ 
der application. This application is freely accessible on 
http://neo.euromise.cz/cor. 

2.1. The Cardio Online Reader Database 

The database of citations and abstracts of biomedical 
science articles is the main part of the Cardio Online 
Reader application (COR). This database uses a MySQL 
database engine. The main data source for our project is 
the PubMed database, that can be ušed free of charge. 
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Figuře 1: An example of COR application web interface showing the list of articles retrieved for parameters set in the filter form 
flelds above. 


Import of the data was realized by the query to the Pu¬ 
bMed database defining the domain of Cardiology by 
using Most important MeSH terms from Cardiology. 

We filtered off articles, which do not fulfil our qualita- 
tive criteria from the Evidence Based Medicine point of 
view. We selected only these types of articles; 

- Randomized Controled Trials, 

- Systematic Reviews, 

- Systematic Reviews with Metaanalysis, 

- Guidelines, 

- Practical Guidelines. 

The result of this query was saved in the XML filé. 

Exported XML tile was parsed by one-purpose PHP im¬ 
port script and selected data fields (title, authors, MeSH 
terms, abstract, unique identificator PMID, dáte of the 


abstract publication in the MEDLINE database, link to 
the full text, Journal title) were saved to the COR data¬ 
base. 

The actualization of the COR database proceeds daily 
by an automatically started PHP script, which browse 
through an RSS channel of the PubMed database with 
the same query parameters as the originál import script. 
The actualization script uses tools from the Entrez Pro- 
gramming Utilities [4] for gathering speciál data for 
each article, that are not part of the RSS channel. 

2.2. The Web Interfaee of the COR 

Contrary to the originál PubMed interface we concentra- 
ted on the fastest way to reducing the number of search 
results preserving the focus on results important for the 
clinical practice. 

Users can limit search results by one mouse-click to one 
category of EBM quality of evidence. Users can also use 
six form fields of the filter on the home page of the COR 
application for entering search criteria. 
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as per tne application order and included to tne study. Each subject underwent transthoracic two-dimensional 
(2D) guided M-mode ecbocardiogram. We measured epicardial fat tbíckness on tne 1/3 section close to tne 
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parasternal short axis (SA) víews. Multiple regression analysís showed that WC, systolic blood pressure (SBP) 
and age were the strongest Independent variables correlated with EAT (p<0.001). We aiso determined a 
significant correlation between low-density lipoprotein-cholesterol (LDL-C) and EAT (p<0.05). Our data show 
that EAT-measurement by echocardiography is an efficient method in determination of visceral adiposity and 
shall be taken into consideration especially when advanced age groups are in question. 
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Figuře 2: An example of COR application web interface showing the detail of an article record. 


Individual form field stands for entering a part of text in 
the article title or abstract. Another form field specifies 
requested author or authors. The third large form field 
stands for entering parts or exact full terms of Medical 
Subject Headings (MeSH) thesaurus. Users can write 
down requested MeSH terms or thez can choose them 
from a generated MeSH Cloud or MeSH List, where 
terms are displayed in relation to their appearance in ar- 
ticles or sorted alphabetically. 

In these form fields it is possible to use logical operators 
AND and OR. It is also possible to use a dynamically 
generated autocomplete function in these three fields to 
simplify entering exact phrases. 

Users can limit the list of search results by setting the 
lowest and the highest dáte of publishing the article in 
the MEDLINE database in next two form fields. The da- 
tes can be set manually or chosen from the JavaScript 
dáte picker. 


The last form field stands for the manual choice of the 
category of EBM quality of evidence. 

For the fast choice of most frequented MeSH terms and 
their insertion to the filter, there is a ”MeSH cloud” in 
the right part of the application web page, where enlis- 
ted terms differs in the text size displaying frequency of 
each term in the database. Users can use the list of last 
search queries. 


2.3. Search Results 

The COR application display search results matching 
entered parameters below the filter. Search results are in 
descending order sorted by the dáte of publication in the 
MEDLINE database. The simple list of results shows ar¬ 
ticle title, names of authors and the dáte of publication 
in the MEDLINE database. There can be maximum of 
15 results on one page, user can browse through the re- 
sult pages. The EBM category of article can be differen- 
tiate by graphical icon. 
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By clicking on the article title in the list of results user 
can navigate to the detail page of the article record. The 
detail page shows the article title, names of authors, list 
of assigned MeSH terms, PMID identificator, link to the 
originál record in the PubMed database, link to the full 
text of the article (if available in the Internet), article 
abstract, Journal title and the dáte of publication in the 
MEDLINE database. The User of the COR application 
can rate the helpfulness of the article in the scale from 
one to five stár symbols. This rating is linked to the IP 
address, so one user can rate a single article only one 
time. User can also attach a comment to the article. 

2.4. Web 2.0 Sociál Functions 

Second big task for the COR application is to allow easy 
sharing of clinically important search results with colle- 
agues, friends and professional community. The detail 
of the article including abstract and bibliographical data 
can be shared via email, Eacebook, Twitter and other 
sociál networks contained in the „Share this“ web Ser¬ 
vice [5]. 

Excepting concrete scientific article detail, the COR of- 
fers an easy way to share a link to itself via Share this 
Service, Eacebook, Twitter, e-mail or one of 21 most 
common sociál and bookmarking Services like Digg, 
Delicious, Reddit, Youhoo! od Google Bookmarks. 

Users can follow COR own profiles at Eacebook and 
Twitter, Blogger account and Youtube channel. Users 
can subscribe to RSS channels with last 20 articles or 
last 20 comments generally or individually for each 
EBM category of articles. 

2.5. Future Plans and Improvements 

We pian further improvements and simplifications in the 
web interface of the COR application in the future. One 
thing which can speed up using the filter and make the 
Work more illustrative is to plače a graphical slider and 
the time plot showing numbers of articles published in 
the discrete time periods and their selection in the filter. 

Long term problém is to optimise the autocomplete 
function in three form fields in the filter to help users 
in inserting key terms in the easiest way. This process 
should be evaluated in the cooperation with common 
users. 

Geotagging can help to make search results more 
regionally-oriented. Metadata contained in the PubMed 
database can show in which country the article was pu¬ 
blished. Geographical information in the field ”Affili- 
ation” is even more interesting. It is possible to find out 
where the article was created and what population is in 


the article described. We can draw this information in 
the map or allow its limitation in the filter. 

We assume an individualization of the web interface for 
registered users in the future development of the COR 
application. After the registration process and logging 
in user could browse the history of own search queries, 
create lists of favourite articles, let the systém send him 
notification on some events in the database or define own 
RSS channels or add authorized comments and ratings. 

The COR application not only can serve users of the 
web interface or RSS readers. By creating a XML data 
interface we can connect another information systems 
and send them search results or record details on their 
demands. Possible Service for hospital information Sys¬ 
tems could be to offer relevant document for the con¬ 
crete clinical situation defined by MeSH and geographi¬ 
cal terms. 

2.6. Discussion 

Widely accepted PubMed database of biomedical citati- 
ons has a free accessible web interface with a basic or 
advanced version of the search. We can use other web 
Services for searching for scientific articles by clinical 
terms or other parameters. These Services are more ge¬ 
nerál (Google, Google Scholar) or focused on natural 
Sciences (Scopus). Why create another search tool? 

The amount of scientific articles indexed in electronic 
databases increase steeply. Recent question ”where to 
find” will surrender to questions ”how to search” and 
”how to search the easiest way”. The COR offers sim- 
ple and fast way how to search the PubMed database 
for articles in the field of Cardiology and with the fo- 
cus on highest evidence. It copes only one thousandth of 
the PubMed database and provides easy-to-use tools for 
setting the search query, that can acquire smáli amount 
of articles appropriate to the clinical need. 

There are another web Services specialized 

on searching in large databases of scien¬ 
tific bibliography (http://demos.vivisimo.com, 

http://www.tripdatabase.com, http://www.pubmeddy.com 
- discontinued). The COR is unique in its focus on one 
domain (Cardiology), on few defined EBM categories 
most important for clinical practice and in the simplicity 
of use. 

The key question for the progress of the COR appli¬ 
cation will be the interest of expert medical commu¬ 
nity. The COR contains tools for sharing scientific in¬ 
formation between experts, tools for subjective rating 
of their quality and tools for expert discussion. Experts 
could be motivated by functions for registered users, in- 
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dividualized search functions, the ease of use and the 
fact, that this application is free of charge. 

The COR application is designed specially for Cardio- 
logy. Filters ušed for extraction from the PubMed data- 


base firmly defined. Same technology could be ušed for 
another one purpose (one expert domain) web portals. 
Similar filters could be also individually set for regis- 
tered users, so the scope could be widened to another 
domains. 


Name of Service 

Number of results 

Google.com 

approx. 66,000,000 

Google.com last year 

approx 288,000,000 

Google Scholar 

approx 1,700,000 

Scopus 

243,866 

Scopus - only Health Sciences 

179,511 

PubMed 

144,097 

COR 

1,695 

COR - Practice Guidelines 

79 

COR - Pracice Guidelines in last 5 years 

33 


Table 1: Comparison of the number of search results in different web Services - searching for the MeSH term ”heart failure”. 


3. Conclusion 

We created the Cardio Online Reader application for an 
easy search for clinical relevant scientific articles in the 
field of Cardiology. This application is freely accessi- 
ble on http://neo.euromise.cz/cor. The PubMed data- 
base is the main data source for our application. 

The application contains a filter consisted from six form 
fields. Search results are in descending order sorted by 
the dáte of publication in the MEDLINE database. The 
detail page shows the article title, names of authors, list 
of assigned MeSH terms, PMID identificator, link to the 
originál record in the PubMed database, link to the full 
text of the article (if available in the Internet), article 
abstract, Journal title and the dáte of publication in the 
MEDLINE database. 

The COR application offers an easy access to Services 
for content sharing as the ”Share this” Service, sociál 
and bookmarking Services, comments and ratings and 
sharing via email. 
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